[ https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113471#comment-13113471 ]
Ken Krugler commented on TIKA-728: ---------------------------------- Jukka said (on the list): {quote} Instead of mapping the RDFa <meta> tags to Tika's Metadata and then back to normal XHTML <meta> tags, we might want to consider switching from plain XHTML to XHTML-with-RDFa as Tika's output format. That should make it easier to support more descriptive metadata and content annotations down the line. In any case it would still be good to mapRDFa <meta> tags also to the Metadata object. To do that properly (and to open the way to better XMP integration, my favourite TODO item :-), we'll probably need to extend the Metadata class to handle things like namespaces and structured values. {quote} > Return RDFa meta tags via Metadata > ---------------------------------- > > Key: TIKA-728 > URL: https://issues.apache.org/jira/browse/TIKA-728 > Project: Tika > Issue Type: Improvement > Reporter: Ken Krugler > Assignee: Ken Krugler > Priority: Minor > > Open Graph <meta> tags currently get stripped out, and also aren't put into > the metadata map. > The reason why is that Open Graph uses RDFa: > http://stackoverflow.com/questions/2704942/html-validation-error-for-property-attribute/2705090#2705090 > Since <meta property="xxx" content="yyy" /> isn't valid for XHTML 1.0, these > tags can't be emitted. > We could take a tag like: > <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> > and put it into the metadata map as "og:url" => > "http://www.imdb.com/title/tt0117500/" -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira