[ 
https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000680#comment-15000680
 ] 

Nick Burch commented on TIKA-980:
---------------------------------

Taking a look at {{TIKA-980-1.3-5.patch}}, there's some {{System.out}} calls in 
the unit test which would need removing/replacing with asserts as starters

My only other question - is a special ContentHandler with strict rules on input 
(needing html mappers setting on the context to work), which returns objects, 
the right way to go? Or should we be trying to map these Microdata blocks into 
the regular Metadata? (With a suitable set of keys/prefixes). Can someone who 
knows the Microdata world well comment on why it's been done as it has, and not 
via Metadata properties?

> MicrodataContentHandler for Apache Tika
> ---------------------------------------
>
>                 Key: TIKA-980
>                 URL: https://issues.apache.org/jira/browse/TIKA-980
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Markus Jelsma
>            Assignee: Ken Krugler
>             Fix For: 1.12
>
>         Attachments: TIKA-980-1.3-1.patch, TIKA-980-1.3-2.patch, 
> TIKA-980-1.3-3.patch, TIKA-980-1.3-4.patch, TIKA-980-1.3-5.patch
>
>
> ContentHandler for Apache Tika capable of building a data structure 
> containing Microdata item scopes and item properties. The Item* classes are 
> borrowed from the Apache Any23 project and are slightly modified to 
> accomodate this SAX-based extractor vs the original DOM-based extractor.
> The provided unit test outputs two item scopes about the Europe and NA 
> ApacheCon events and each has a nested property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to