[
https://issues.apache.org/jira/browse/TIKA-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ray Gauss II resolved TIKA-1133.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.4
Resolved in r1491680.
> Ability to Allow Empty and Duplicate Tika Values for XML Elements
> -----------------------------------------------------------------
>
> Key: TIKA-1133
> URL: https://issues.apache.org/jira/browse/TIKA-1133
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.3
> Reporter: Ray Gauss II
> Assignee: Ray Gauss II
> Fix For: 1.4
>
>
> In some cases it is beneficial to allow empty and duplicate Tika metadata
> values for multi-valued XML elements like RDF bags.
> Consider an example where the original source metadata is structured
> something like:
> {code}
> <Person>
> <FirstName>John</FirstName>
> <LastName>Smith</FirstName>
> </Person>
> <Person>
> <FirstName>Jane</FirstName>
> <LastName>Doe</FirstName>
> </Person>
> <Person>
> <FirstName>Bob</FirstName>
> </Person>
> <Person>
> <FirstName>Kate</FirstName>
> <LastName>Smith</FirstName>
> </Person>
> {code}
> and since Tika stores only flat metadata we transform that before invoking a
> parser to something like:
> {code}
> <custom:FirstName>
> <rdf:Bag>
> <rdf:li>John</rdf:li>
> <rdf:li>Jane</rdf:li>
> <rdf:li>Bob</rdf:li>
> <rdf:li>Kate</rdf:li>
> </rdf:Bag>
> </custom:FirstName>
> <custom:LastName>
> <rdf:Bag>
> <rdf:li>Smith</rdf:li>
> <rdf:li>Doe</rdf:li>
> <rdf:li></rdf:li>
> <rdf:li>Smith</rdf:li>
> </rdf:Bag>
> </custom:LastName>
> {code}
> The current behavior ignores empties and duplicates and we don't know if Bob
> or Kate ever had last names. Empties or duplicates in other positions result
> in an incorrect mapping of data.
> We should allow the option to create an {{ElementMetadataHandler}} which
> allows empty and/or duplicate values.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira