[
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660009#comment-14660009
]
Tim Allison commented on TIKA-1607:
-----------------------------------
So, the reason I went with putting more on the value side is to handle multiple
values. I'm not sure how you would link up a given phone number with a given
countrycode in the above? Same goes for latitude+longitude...if a given
metadata object has more than one lat/lon pair, how can you determine which
goes with which, unless they are linked in the value or unless you go with
adding an index number to the property, which gets messy, I would think. I'm
sure I'm missing something. :)
> Introduce new arbitrary object key/values data structure for persitsence of
> Tika Metadata
> -----------------------------------------------------------------------------------------
>
> Key: TIKA-1607
> URL: https://issues.apache.org/jira/browse/TIKA-1607
> Project: Tika
> Issue Type: Improvement
> Components: core, metadata
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Critical
> Fix For: 1.10
>
> Attachments: TIKA-1607v1_rough_rough.patch,
> TIKA-1607v2_rough_rough.patch, TIKA-1607v3.patch
>
>
> I am currently working implementing more comprehensive extraction and
> enhancement of the Tika support for Phone number extraction and metadata
> modeling.
> Right now we utilize the String[] multivalued support available within Tika
> to persist phone numbers as
> {code}
> Metadata: String: String[]
> Metadata: phonenumbers: number1, number2, number3, ...
> {code}
> I would like to propose we extend multi-valued support outside of the
> String[] paradigm by implementing a more abstract Collection of Objects such
> that we could consider and implement the phone number use case as follows
> {code}
> Metadata: String: Object
> {code}
> Where Object could be a Collection<HashMap<String/Property,
> HashMap<String/Property, String/Int/Long>> e.g.
> {code}
> Metadata: phonenumbers: [(+162648743476: (LibPN-CountryCode : US),
> (LibPN-NumberType: International), (etc: etc)...), (+1292611054:
> LibPN-CountryCode : UK), (LibPN-NumberType: International), (etc: etc)...)
> (etc)]
> {code}
> There are obvious backwards compatibility issues with this approach...
> additionally it is a fundamental change to the code Metadata API. I hope that
> the <String, Object> Mapping however is flexible enough to allow me to model
> Tika Metadata the way I want.
> Any comments folks? Thanks
> Lewis
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)