Hey Jukka, So you're seeing the delineation more as:
* metadata = document level stuff * XHTML = textual representation [which can included finer-grained what I would call "metadata" too] ? If so, interesting, I wonder then if there should be some sort of rethinking then of the way that we capture or represent the XHTML because I would think that our existing Metadata object could be reused at that level too. Maybe have like a textual/XHTML metadata object as well, where the keys were things like the IDs (or some generated ID) representing each XHTML tag (where nesting is something like key=outer tag/inner tag 1/inner tag 2) and where the values were the attribute values themselves. I wonder if this would work as a representation format. Then it's easy to define "views" on top of the Metadata object like an hCard view, or an "XHTML" view [with attributes and w/o]. WDYT? Cheers, Chris On 5/26/10 8:02 AM, "Jukka Zitting" <[email protected]> wrote: Hi, On Wed, May 26, 2010 at 3:49 PM, Mattmann, Chris A (388J) <[email protected]> wrote: > I'm worried that we're mixing concerns here. Some of the information that > you cite above sounds more to me like metadata (and in fact, thinking about > it, you could argue that attributes themselves on the XHTML amount that > defines the textual structure) are more like metadata attributes too. Where > do you see the delineation? The Metadata object can only represent document-level metadata, so it's not suitable for things like: * this paragraph is written in French * the bounding box of this word is X on PDF page Y * this phrase is a hyperlink to URL X * these words denote a physical address XHTML attributes are a perfect way to represent such annotations. It would be great if we could leverage some of the applicable microformat standards like hCard to simplify downstream use of such information. BR, Jukka Zitting ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
