Re: [uf-new] Microformats for hidden data

2009-11-27 Thread Fiann O'Hagan
Brian, thank you for the very detailed response. I do understand this
better now.

I completely agree about the issue of out of sight, out of mind. It's
exactly the problem that applies to web analytics data now. What I
want to do is to bring the data a little more out into the visible
world.

What typically happens on big enterprise sites is that they have an
analytics product which requires certain per-page metadata, such as a
page name and category. This is different from typical installations
of the free tools like Google Analytics, partly because these are
larger, more complex sites with deeper analytics needs, and partly
because they often have horrible legacy URL structures which makes it
impossible to just record visits to URLs.

There is a lot of existing deployment of these tags, see for example
data 
at http://www.jgc.org/blog/2009/10/some-real-data-about-javascript-tagging.html

Our overriding interest is in making the data available to more than
one tag on the page, so that the data doesn't have be declared
multiple times in different formats.

It's certainly possible to do this purely in JavaScript, where the
data is currently declared. But the secondary goal is make the
information a bit more accessible for the people who are responsible
for the content. Many of them are editors who are reasonably
comfortable reading html, but won't touch JavaScript because it is
programming.
Hence the interest in potentially using microformats.

Right now, the editors who are responsible for populating the data,
and the analysts who are the audience, commonly have no access at all
to check whether it's correct or correct any issues.

 Would you trust a
 smaller set of data that has a higher probability of being accurate,
 or loads of hidden data that has a higher probability of being
 inaccurate?

I agree completely completely with this sentiment. But my question is,
given that there is data which is already hidden, crufty and
out-of-sync, can we do something to shed a little more light on it?

I hope this helps explain where we're coming from.

Fiann

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-27 Thread Fiann O'Hagan
Brian, that's exactly what I am hoping to do, you have captured it precisely.

hAtom gives a lot but not all of what I am looking for (my baseline is
the fields that are common to all the major web analytics products).
hAtom is focussed on blog posts rather than generic website pages, and
I am not sure it is an exact fit, but the core concept is very
similar.

The reason I am interested in using microformats is that if by using a
standard, I can turn your suggestion of
var page = $('.entity-title');
into hAtom format
var page = $('.hfeed .hentry .entry-title');

and it will work across any site with that markup, which is much
better than defining our own POSH format specific to a single site.

Thanks again for all the detailed feedback and putting up with this long thread.

Fiann


2009/11/27 Brian Suda brian.s...@gmail.com:
 On Fri, Nov 27, 2009 at 2:12 PM, Fiann O'Hagan fia...@jshub.org wrote:
 What typically happens on big enterprise sites is that they have an
 analytics product which requires certain per-page metadata, such as a
 page name and category.

 --- yup, I know them well. One solution would be to define your own
 POSH format and/or re-use something like hAtom.

 Then in the JS code for declaring variables for tracking you can
 reference the microformats, for instance:

 Instead of:
 var page = news-index;
 var campaign = news

 you could replace the declared variables with references to the
 visible text such as:
 var page = $('.entity-title');
 var campaign = $('a[rel=tag]');

 In the JS you are referencing visible data. As editors change fields
 in the CMS, the tracking codes, campaigns, sections, and other
 tracking is done automatically. What you need is the mapping between
 the visible parts of the page and your specific tracking variables. It
 also depends on how much you want to connect the two and/or allow
 editors to be changing these values.

 -brian

 --
 brian suda
 http://suda.co.uk
 ___
 microformats-new mailing list
 microformats-new@microformats.org
 http://microformats.org/mailman/listinfo/microformats-new

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


[uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
Hi everyone,

A little while ago my colleague Liam posted on this list about the
jsHub project and our ideas for a microformat to replace the
proprietary JavaScript currently used for web analytics metadata. He
got some good feedback, and I can see there's work we need to do.

Here's the use case we want to address: there is a lot of information
currently stored in pages which is encoded in vendor-specific
JavaScript variables. There are many reasons why the microformat
approach (in principle) would be better than the current situation.
Publishers of big sites find that they are now using multiple tags,
and therefore it makes sense to have a single version of the data
about each page rather than re-declaring it in multiple formats. I
also believe that some of this information (page name and category,
for example) would be of great interest to search engine spiders if it
was accessible.

I would like to take a step back from comments on our specific
proposal and ask a much more general question.

Are there any materials currently available about information which is
not in the visible HTML of the page?

As far as I can see, all the microformats currently in use start with
information which is visible in the page, and then add markup to
indicate what it represents. For example, with hProduct, you start
with the existing product name, price etc in the page, and add the
appropriate classes to indicate what these fields represent.

But there is a wealth of information hidden within the page in meta
tags and in JS blocks. For example on the microformats.org wiki at
http://microformats.org/wiki/hcard-faq

var wgPageName = hcard-faq;
var wgTitle = hcard-faq;
var wgAction = view;

It's quite possible that for web analytics purposes, you might want to
use the page name hcard-faq which is different from both the HTML
title element hCard FAQ  middot; Microformats Wiki and the URL path
/wiki/hcard-faq.

Is there any guidance available about these cases, where the
information we want to capture is not part of the visible page? Please
note that it is human readable, but the person consuming the data is
different from the end user browsing the site, for example, it is
someone looking at reports on the most popular pages on the site.

This means that some of the microformats principles, such as visible
data not invisible metadata, can't directly apply.

Thanks for any feedback,

Fiann

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
Thanks for the quick response Dan.

 You already highlight the existence of meta, and I
 guess I'd just draw a stronger contrast between that and proprietary /
 random Javascript variables. There's a lot to be said for not having
 to run a Javascript interpreter to figure out the basic data
 structures encoded in a Web page. So maybe meta is worth some more
 investigation, rather than just listing it as part of the problem...?

I agree we shouldn't dismiss the meta tag.

The problem as I see it is that the meta tag only allows a simple
association of name=value pairs. It doesn't allow any kind of
structured data. So you can have a meta tag that gives the author, as
recommended in the HTML spec
http://www.w3.org/TR/html401/struct/global.html#h-7.4.4

For example, to specify the author of a document, one may use the META
element as follows:
META name=Author content=Dave Raggett

But if you want to use hCard to give contact details for the author,
you can't, because it's an opaque string.

There's additional complexity with the content of the tag being in an
XML attribute rather than a text node too, which complicates the
escaping required for the string and means that you cannot include any
HTML in the text.

As I understand it, these limitations are what led the W3C to create
RDF, which is cross-linked from the meta element in the HTML spec. And
the complexity of RDF, is of course what led to the rise of
microformats.

One final serious limitation of the meta element is that it is only
valid in the head of a document, and not in the body. With more
complex pages, for example tabbed layouts, and content served in via
AJAX, there's a good case to associate page metadata with a fragment
of the page rather than the entire HTML document. That's not possible
unless you can define a wrapper element around the content you are
concerned with.

Does that make sense, or should I be looking at it again?

Fiann
___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new


Re: [uf-new] Microformats for hidden data

2009-11-26 Thread Fiann O'Hagan
That's interesting Scott. I am not sure I have understood your point
completely, but I'd like to explore it.

 This isn't very accurate.  RDF was not created primarily as a response to
 HTML's limitations, nor microformats as a response to RDF's complexity.

I agree. RDF is not about a limitation of HTML, but it is an attempt
to allow data which is too complex to convey in the meta tag alone.

 I agree, this seems much more in line with RDFa than microformats.  To do
 this in microformats, we'd need to throw out the visible data requirement,
 and re-interpret all of the other guidelines to no longer presume visible
 data.  And after a lot of work, the result would end up looking a lot like
 RDFa.

Why would it end up looking like RDFa? This is the part I don't
understand. RDFa looks like it does because it involves XML
namespaces, namespaced values for XML attributes, and URIs. The markup
indicates relations between items, where the nature of the relation is
defined by resolving a URI.

In contrast, microformats simply use some well known class names. If
we have an element with the class of hproduct, it describes a product.
Inside that, an element with the class of fn is the product name.
There is no URI to dereference to understand what is meant by fn.

So when you say that it would end up looking like RDFa, do you mean in
terms of syntax? Or do you mean in terms of the data being applied as
attributes to elements that are otherwise visible, like the about
attribute being added to a div?

If it's the second one, then I was imaging something much simpler,
which looks like any other microformat, but with some or all of the
content in a CSS display:none region of the page. That to me still
looks like a microformat, not like RDFa.

Fiann

___
microformats-new mailing list
microformats-new@microformats.org
http://microformats.org/mailman/listinfo/microformats-new