I think having some specific patches of how this would look
would help to take it less away from the abstract and more
into the concrete area. I encourage you to try it out MartinD,
and see if there is a good overlap there.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Martin Desruisseaux <[email protected]>
Organization: Geomatys
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, October 13, 2015 at 1:34 PM
To: "[email protected]" <[email protected]>
Subject: Re: ISO 19115 as a metadata model for Tika?

>Le 12/10/15 14:22, Nick Burch a écrit :
>> Currently, it's very easy for a new user of Tika to get the metadata
>> they want out, they can just fetch a simple string value to get
>> started with. You can, when you learn more, start getting more richly
>> typed values out, but the quickstart is simple. Some libraries make it
>> so that you have to learn the full rich metadata structure right from
>> the get-go, which causes problems for new users. Whatever we do to
>> help the power users, we need to not ruin it for the beginners!
>
>What would be the approach for more richly typed values? Would they be
>an extension of the current model, or a second model existing in
>parallel with the first one?
>
>
>> For the discussion on "what should a richer Tika metadata system be
>> based on", I think TIKA-1607 is where that is taking place, plus some
>> related threads on-list.
>
>Thanks for the link. TIKA-1607 seems to be about associating arbitrary
>java.lang.Object to property keys. But isn't a little bit opaque? I
>mean, if a user get an instance of a class that he doesn't know, how to
>extract information from it?
>
>
>> In the short term, if there are some key parts of that standard for
>> geospacial metadata that we don't currently handle, and could do
>> easily with the current setup, then we should raise a JIRA + get a
>> sample file + add the support
>
>Regarding ISO 19115 support, what seems the main question to me is how
>to handle a tree structure? The current Tika metadata structure seems to
>be like a Map<String,String[]> (please correct me if I'm wrong), while
>ISO 19115 is more like a Map<String,Node> where each Node can contains
>children nodes, thus forming a tree. The following example in Tika:
>
>    Creator…………………… Jon Smith
>    Publisher……………… A company
>    Title………………………… Anything
>
>would be in the ISO 19115 model (note how the creator and publisher are
>grouped under the same "responsible party" node):
>
>    Citation
>     ├─Title………………………………………………… Anything
>     └─Cited responsible party
>       [1]
>        ├─Role…………………………………………… Author
>        └─Individual
>           └─Name…………………………………… Jon Smith
>       [2]
>        ├─Role…………………………………………… Publisher
>        └─Organisation
>           └─Name…………………………………… A company
>
>The tree structure allows to put other information, like email address
>and phone numbers, without confusion about whether the address applies
>to the creator or to the publisher. Of course a flat structure could
>prefix property names (e.g. "creator_address", "publisher_address",
>etc.), but this would result in a lot of keys. For example ISO 19115
>defines 20 standard roles (resourceProvider, custodian, owner, user,
>distributor, originator, pointOfContact, principalInvestigator,
>processor, publisher, author, sponsor, coAuthor, collaborator, editor,
>mediator, rightsHolder, contributor, funder, stakeholder) and each of
>them can be associated to about 30 properties under the "Cited
>responsible party" node (name, positionName, phone, city,
>administrativeArea, postalCode, country, hoursOfService,
>contactInstruction, onlineResource, etc.). Does Tika would like to
>handle such amount of data, and if yes is a flat structure really
>appropriate?
>
>    Martin
>

Reply via email to