[ 
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660046#comment-14660046
 ] 

Nick Burch commented on TIKA-1607:
----------------------------------

My preference is to push the extra thinking onto the parser author rather than 
the downstream user, so I'd rather we do more on the property side and keep the 
values nice and simple

Just as we'd have
{code}
stream[0]/pbcore:essenceTrackType=Video
stream[0]/pbcore:essenceTrackFrameSize=480x270
stream[1]/pbcore:essenceTrackType=Audio
{code}

For complex locations, we could do
{code}
// Standalone
location[0]/latitude=51.1
location[0]/longitude=-1.3
location[1]/latitude=51.11
location[1]/latitude=-1.31
location[1]/country=UK
location[1]/description=Somewhere interesting

// EXIF
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City=Oxford
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName=UK
Iptc4xmpExt:LocationShown[1]/Iptc4xmpExt:City=London
Iptc4xmpExt:LocationShown[1]/Iptc4xmpExt:CountryName=UK
{code}

For the audio/video use-case, Ray's solution seemed neater to me for 
complex/nested cases. For others like contacts of location, I think it's OK, 
and hopefully it simpler for users, but happy to be persuaded otherwise!

> Introduce new arbitrary object key/values data structure for persitsence of 
> Tika Metadata
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-1607
>                 URL: https://issues.apache.org/jira/browse/TIKA-1607
>             Project: Tika
>          Issue Type: Improvement
>          Components: core, metadata
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Critical
>             Fix For: 1.10
>
>         Attachments: TIKA-1607v1_rough_rough.patch, 
> TIKA-1607v2_rough_rough.patch, TIKA-1607v3.patch
>
>
> I am currently working implementing more comprehensive extraction and 
> enhancement of the Tika support for Phone number extraction and metadata 
> modeling.
> Right now we utilize the String[] multivalued support available within Tika 
> to persist phone numbers as 
> {code}
> Metadata: String: String[]
> Metadata: phonenumbers: number1, number2, number3, ...
> {code}
> I would like to propose we extend multi-valued support outside of the 
> String[] paradigm by implementing a more abstract Collection of Objects such 
> that we could consider and implement the phone number use case as follows
> {code}
> Metadata: String:  Object
> {code}
> Where Object could be a Collection<HashMap<String/Property, 
> HashMap<String/Property, String/Int/Long>> e.g.
> {code}
> Metadata: phonenumbers: [(+162648743476: (LibPN-CountryCode : US), 
> (LibPN-NumberType: International), (etc: etc)...), (+1292611054: 
> LibPN-CountryCode : UK), (LibPN-NumberType: International), (etc: etc)...) 
> (etc)] 
> {code}
> There are obvious backwards compatibility issues with this approach... 
> additionally it is a fundamental change to the code Metadata API. I hope that 
> the <String, Object> Mapping however is flexible enough to allow me to model 
> Tika Metadata the way I want.
> Any comments folks? Thanks
> Lewis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to