[ 
https://issues.apache.org/jira/browse/TIKA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113471#comment-13113471
 ] 

Ken Krugler commented on TIKA-728:
----------------------------------

Jukka said (on the list):

{quote}
Instead of mapping the RDFa <meta> tags to Tika's Metadata and then
back to normal XHTML <meta> tags, we might want to consider switching
from plain XHTML to  XHTML-with-RDFa as Tika's output format. That
should make it easier to support more descriptive metadata and content
annotations down the line.

In any case it would still be good to mapRDFa <meta> tags also to the
Metadata object. To do that properly (and to open the way to better
XMP integration, my favourite TODO item :-), we'll probably need to
extend the Metadata class to handle things like namespaces and
structured values.
{quote}



> Return RDFa meta tags via Metadata
> ----------------------------------
>
>                 Key: TIKA-728
>                 URL: https://issues.apache.org/jira/browse/TIKA-728
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>            Priority: Minor
>
> Open Graph <meta> tags currently get stripped out, and also aren't put into 
> the metadata map.
> The reason why is that Open Graph uses RDFa:
> http://stackoverflow.com/questions/2704942/html-validation-error-for-property-attribute/2705090#2705090
> Since <meta property="xxx" content="yyy" /> isn't valid for XHTML 1.0, these 
> tags can't be emitted.
> We could take a tag like:
> <meta property="og:url" content="http://www.imdb.com/title/tt0117500/"; />
> and put it into the metadata map as "og:url" => 
> "http://www.imdb.com/title/tt0117500/";

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to