[uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
Hi everyone,

A little while ago my colleague Liam posted on this list about the
jsHub project and our ideas for a microformat to replace the
proprietary JavaScript currently used for web analytics metadata. He
got some good feedback, and I can see there's work we need to do.

Here's the use case we want to address: there is a lot of information
currently stored in pages which is encoded in vendor-specific
JavaScript variables. There are many reasons why the microformat
approach (in principle) would be better than the current situation.
Publishers of big sites find that they are now using multiple tags,
and therefore it makes sense to have a single version of the data
about each page rather than re-declaring it in multiple formats. I
also believe that some of this information (page name and category,
for example) would be of great interest to search engine spiders if it
was accessible.

I would like to take a step back from comments on our specific
proposal and ask a much more general question.

Are there any materials currently available about information which is
not in the visible HTML of the page?

As far as I can see, all the microformats currently in use start with
information which is visible in the page, and then add markup to
indicate what it represents. For example, with hProduct, you start
with the existing product name, price etc in the page, and add the
appropriate classes to indicate what these fields represent.

But there is a wealth of information hidden within the page in meta
tags and in JS blocks. For example on the microformats.org wiki at
http://microformats.org/wiki/hcard-faq

var wgPageName = hcard-faq;
var wgTitle = hcard-faq;
var wgAction = view;

It's quite possible that for web analytics purposes, you might want to
use the page name hcard-faq which is different from both the HTML
title element hCard FAQ  middot; Microformats Wiki and the URL path
/wiki/hcard-faq.

Is there any guidance available about these cases, where the
information we want to capture is not part of the visible page? Please
note that it is human readable, but the person consuming the data is
different from the end user browsing the site, for example, it is
someone looking at reports on the most popular pages on the site.

This means that some of the microformats principles, such as visible
data not invisible metadata, can't directly apply.

Thanks for any feedback,

Fiann

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Dan Brickley
On Thu, Nov 26, 2009 at 12:27 PM, Fiann O'Hagan fia...@jshub.org wrote:
 Hi everyone,

 A little while ago my colleague Liam posted on this list about the
 jsHub project and our ideas for a microformat to replace the
 proprietary JavaScript currently used for web analytics metadata. He
 got some good feedback, and I can see there's work we need to do.

 Here's the use case we want to address: there is a lot of information
 currently stored in pages which is encoded in vendor-specific
 JavaScript variables.
[...]
 Are there any materials currently available about information which is
 not in the visible HTML of the page?
[...]
 But there is a wealth of information hidden within the page in meta
 tags and in JS blocks.

Interesting questions. My take on microformatism is that a big part of
the value has been in encouraging people to look more closely at the
tags available in HTML, at their existing official meaning and at the
possibilities for using them to carry more specific / tightly defined
data structures. You already highlight the existence of meta, and I
guess I'd just draw a stronger contrast between that and proprietary /
random Javascript variables. There's a lot to be said for not having
to run a Javascript interpreter to figure out the basic data
structures encoded in a Web page. So maybe meta is worth some more
investigation, rather than just listing it as part of the problem...?

cheers,

Dan
___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
Thanks for the quick response Dan.

 You already highlight the existence of meta, and I
 guess I'd just draw a stronger contrast between that and proprietary /
 random Javascript variables. There's a lot to be said for not having
 to run a Javascript interpreter to figure out the basic data
 structures encoded in a Web page. So maybe meta is worth some more
 investigation, rather than just listing it as part of the problem...?

I agree we shouldn't dismiss the meta tag.

The problem as I see it is that the meta tag only allows a simple
association of name=value pairs. It doesn't allow any kind of
structured data. So you can have a meta tag that gives the author, as
recommended in the HTML spec
http://www.w3.org/TR/html401/struct/global.html#h-7.4.4

For example, to specify the author of a document, one may use the META
element as follows:
META name=Author content=Dave Raggett

But if you want to use hCard to give contact details for the author,
you can't, because it's an opaque string.

There's additional complexity with the content of the tag being in an
XML attribute rather than a text node too, which complicates the
escaping required for the string and means that you cannot include any
HTML in the text.

As I understand it, these limitations are what led the W3C to create
RDF, which is cross-linked from the meta element in the HTML spec. And
the complexity of RDF, is of course what led to the rise of
microformats.

One final serious limitation of the meta element is that it is only
valid in the head of a document, and not in the body. With more
complex pages, for example tabbed layouts, and content served in via
AJAX, there's a good case to associate page metadata with a fragment
of the page rather than the entire HTML document. That's not possible
unless you can define a wrapper element around the content you are
concerned with.

Does that make sense, or should I be looking at it again?

Fiann
___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Toby Inkster
On Thu, 2009-11-26 at 15:23 +, Fiann O'Hagan wrote:
 As I understand it, these limitations are what led the W3C to create
 RDF, which is cross-linked from the meta element in the HTML spec. And
 the complexity of RDF, is of course what led to the rise of
 microformats. 

Have you considered using RDFa? This is a set of XHTML attributes which
brings the RDF data model to XHTML. (Many parsers also support tag
soup HTML too.)

-- 
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Michael Smethurst



On 26/11/2009 17:33, Fiann O'Hagan fia...@jshub.org wrote:

Hi Fiann

 Hi Toby
 
 Have you considered using RDFa? This is a set of XHTML attributes which
 brings the RDF data model to XHTML. (Many parsers also support tag
 soup HTML too.)
 
 My understanding of RDFa is that it's not possible to include in valid
 XHTML 1.0 

If you want to serve rdfa you'll need to use an rdfa doctype:
!DOCTYPE html PUBLIC -//W3C//DTD XHTML+RDFa 1.0//EN
http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd;

It's the only change you need to make and is fully supported by w3c
validators


 and that in any case there are problems with serving pages
 with an XML mimetype rather than text/html.

There are all kinds of problems serving pages as xml but it's not required
for rdfa - just keep serving as text/html
 
 Do you have any real-world examples of RDFa being published?

The canonical example is the london gazette:
http://www.london-gazette.co.uk/

We're also using a very small dash of rdfa on bbc.co.uk:
http://www.bbc.co.uk/music/reviews/66gb

...with plans to add much more to bbc.co.uk/programmes in the near future


 I can see
 you have created a parser, but I am not aware of many examples outside
 of the W3 site.
 
 I'm interested in RDFa but I do find the arguments in Tantek's mail
 from this list quite compelling
 http://microformats.org/discuss/mail/microformats-discuss/2006-May/004144.html
 
 I'd be interested to know if anything has changed in the last 3 years.
 
 Fiann
 ___
 microformats-new mailing list
 microformats-new@microformats.org
 http://microformats.org/mailman/listinfo/microformats-new


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Scott Reynen

On Nov 26, 2009, at 9:41 AM, Toby Inkster wrote:


As I understand it, these limitations are what led the W3C to create
RDF, which is cross-linked from the meta element in the HTML spec.  
And

the complexity of RDF, is of course what led to the rise of
microformats.


This isn't very accurate.  RDF was not created primarily as a response  
to HTML's limitations, nor microformats as a response to RDF's  
complexity.  The two only rarely overlap on the same use case.  It's  
generally pretty clear which tool is more appropriate for a given  
job.  For example:



Have you considered using RDFa?



I agree, this seems much more in line with RDFa than microformats.  To  
do this in microformats, we'd need to throw out the visible data  
requirement, and re-interpret all of the other guidelines to no longer  
presume visible data.  And after a lot of work, the result would end  
up looking a lot like RDFa.


--
Scott Reynen
MakeDataMakeSense.com
___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Scott Reynen

On Nov 26, 2009, at 10:33 AM, Fiann O'Hagan wrote:


Do you have any real-world examples of RDFa being published?


It seems to me the scarcity of real-world examples is a drawback  
inherent in what you're trying to do, not specific to RDFa.  I'd note  
you also have no real-world examples of your own data being published  
in HTML.  You need to use something like RDFa because you have  
invisible data you want to put in HTML.  RDFa is complex largely  
because it handles invisible data in HTML.  People don't widely use  
RDFa because they find it too complex.


Calling it a microformat wouldn't somehow remove the complexity of  
publishing invisible data in a format focused on visible data.  If  
anything, it would just make things more difficult, because the  
microformats community is largely composed of people who have  
intentionally avoided the problem you're trying to solve, many of whom  
believe it to be unsolvable.


--
Scott Reynen
MakeDataMakeSense.com


___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
That's interesting Scott. I am not sure I have understood your point
completely, but I'd like to explore it.

 This isn't very accurate.  RDF was not created primarily as a response to
 HTML's limitations, nor microformats as a response to RDF's complexity.

I agree. RDF is not about a limitation of HTML, but it is an attempt
to allow data which is too complex to convey in the meta tag alone.

 I agree, this seems much more in line with RDFa than microformats.  To do
 this in microformats, we'd need to throw out the visible data requirement,
 and re-interpret all of the other guidelines to no longer presume visible
 data.  And after a lot of work, the result would end up looking a lot like
 RDFa.

Why would it end up looking like RDFa? This is the part I don't
understand. RDFa looks like it does because it involves XML
namespaces, namespaced values for XML attributes, and URIs. The markup
indicates relations between items, where the nature of the relation is
defined by resolving a URI.

In contrast, microformats simply use some well known class names. If
we have an element with the class of hproduct, it describes a product.
Inside that, an element with the class of fn is the product name.
There is no URI to dereference to understand what is meant by fn.

So when you say that it would end up looking like RDFa, do you mean in
terms of syntax? Or do you mean in terms of the data being applied as
attributes to elements that are otherwise visible, like the about
attribute being added to a div?

If it's the second one, then I was imaging something much simpler,
which looks like any other microformat, but with some or all of the
content in a CSS display:none region of the page. That to me still
looks like a microformat, not like RDFa.

Fiann

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Scott Reynen

On Nov 26, 2009, at 2:56 PM, Fiann O'Hagan wrote:


So when you say that it would end up looking like RDFa, do you mean in
terms of syntax? Or do you mean in terms of the data being applied as
attributes to elements that are otherwise visible, like the about
attribute being added to a div?


I meant the general extension of HTML beyond what makes sense to most  
HTML authors, which is what leads, I suspect, to lower adoption.  Even  
microformats value class pattern starts to look a little like RDFa to  
me, in that the meaning of the markup is not particularly clear to  
someone who only knows HTML semantics.



If it's the second one, then I was imaging something much simpler,
which looks like any other microformat, but with some or all of the
content in a CSS display:none region of the page. That to me still
looks like a microformat, not like RDFa.



To me that looks like neither microformats nor RDFa.  I think putting  
non-content in HTML as content goes against HTML semantics in a pretty  
basic way that neither RDFa nor microformats allow.


On Nov 26, 2009, at 4:40 PM, Tantek Çelik wrote:


2. use the data-* attributes in HTML5 which were explicitly created
to handle the use case of data attributes for scripts/script libraries
among other things.


The prohibition of using data- attributes for public data seems to be  
a problem with this particular use case, as analytics engines are  
generally independent of the site being tracked and These attributes  
are not intended for use by software that is independent of the site  
that uses the attributes.


http://dev.w3.org/html5/spec/Overview.html#embedding-custom-non-visible-data

I've never understood why that restriction was added, as it seems to  
have zero benefit, but it's still there.


Personally, I'd take another look at how far you can get with meta  
tags.  If the only issue with those is that they refer to the whole  
document, there may be a way around that, e.g. using the scheme  
attribute to identify a section ID.


--
Scott Reynen
MakeDataMakeSense.com



___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new