I think having some specific patches of how this would look would help to take it less away from the abstract and more into the concrete area. I encourage you to try it out MartinD, and see if there is a good overlap there.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Martin Desruisseaux <[email protected]> Organization: Geomatys Reply-To: "[email protected]" <[email protected]> Date: Tuesday, October 13, 2015 at 1:34 PM To: "[email protected]" <[email protected]> Subject: Re: ISO 19115 as a metadata model for Tika? >Le 12/10/15 14:22, Nick Burch a écrit : >> Currently, it's very easy for a new user of Tika to get the metadata >> they want out, they can just fetch a simple string value to get >> started with. You can, when you learn more, start getting more richly >> typed values out, but the quickstart is simple. Some libraries make it >> so that you have to learn the full rich metadata structure right from >> the get-go, which causes problems for new users. Whatever we do to >> help the power users, we need to not ruin it for the beginners! > >What would be the approach for more richly typed values? Would they be >an extension of the current model, or a second model existing in >parallel with the first one? > > >> For the discussion on "what should a richer Tika metadata system be >> based on", I think TIKA-1607 is where that is taking place, plus some >> related threads on-list. > >Thanks for the link. TIKA-1607 seems to be about associating arbitrary >java.lang.Object to property keys. But isn't a little bit opaque? I >mean, if a user get an instance of a class that he doesn't know, how to >extract information from it? > > >> In the short term, if there are some key parts of that standard for >> geospacial metadata that we don't currently handle, and could do >> easily with the current setup, then we should raise a JIRA + get a >> sample file + add the support > >Regarding ISO 19115 support, what seems the main question to me is how >to handle a tree structure? The current Tika metadata structure seems to >be like a Map<String,String[]> (please correct me if I'm wrong), while >ISO 19115 is more like a Map<String,Node> where each Node can contains >children nodes, thus forming a tree. The following example in Tika: > > Creator…………………… Jon Smith > Publisher……………… A company > Title………………………… Anything > >would be in the ISO 19115 model (note how the creator and publisher are >grouped under the same "responsible party" node): > > Citation > ├─Title………………………………………………… Anything > └─Cited responsible party > [1] > ├─Role…………………………………………… Author > └─Individual > └─Name…………………………………… Jon Smith > [2] > ├─Role…………………………………………… Publisher > └─Organisation > └─Name…………………………………… A company > >The tree structure allows to put other information, like email address >and phone numbers, without confusion about whether the address applies >to the creator or to the publisher. Of course a flat structure could >prefix property names (e.g. "creator_address", "publisher_address", >etc.), but this would result in a lot of keys. For example ISO 19115 >defines 20 standard roles (resourceProvider, custodian, owner, user, >distributor, originator, pointOfContact, principalInvestigator, >processor, publisher, author, sponsor, coAuthor, collaborator, editor, >mediator, rightsHolder, contributor, funder, stakeholder) and each of >them can be associated to about 30 properties under the "Cited >responsible party" node (name, positionName, phone, city, >administrativeArea, postalCode, country, hoursOfService, >contactInstruction, onlineResource, etc.). Does Tika would like to >handle such amount of data, and if yes is a flat structure really >appropriate? > > Martin >
