Le 15/10/15 13:21, Nick Burch a écrit :
> Tika doesn't only use Dublin Core. Tika uses about half a dozen
> well-known externally defined metadata models (meta-metadata?). Dublin
> core is one of those, but certainly not the only one.

Yes, but in my understanding this is a juxtaposition of many models.
Some bigger (but admittedly more complex) standards like ISO 19115
provide a single consistent model for what is currently splitted in many
models in Tika. I'm not saying that Tika should change (it would be a
never ending story since we could always find yet bigger models, and it
may not be possible to find a model that please to every communities).
I'm just trying to see how those bigger models could fit in Tika picture.


> We rely on external definitions to explain what a metadata key
> represents, and the better known that definition the easier it is for
> our users. We then have the parsers map from their format-specific
> metadata onto the most appropriate well-known key.

Yes, the sis-metadata module works in the same way, except that it maps
only to OGC/ISO 191xx keys.


> Whatever we do, it needs to be easy for people to work out what they
> want, and what something means. If they have to read a many hundred
> page ISO standard to figure it out, we've failed!

Understood, this is where come the question about multiple models. In my
understanding, in some sense Tika currently provides a single model even
if it come from multiple external definitions. For example if someone
wants a date from a XMP file, he needs to use the Dublin core key rather
than the XMP key. But if someone is more familiar with ISO 19115 than
Dublin core, then the above approach could increase the complexity for
him because that user who need to know two models instead of one, and to
remember which ISO 19115 properties need to be accessed by the Dublin
key rather than the ISO key.

An alternative could be to allow the same property to be accessed by two
(or more) keys. Those keys would be defined by different standards
co-existing in Tika. Tika would not provide a model for each data
format, but only for a very small set of well recognized standards (e.g.
2 or 3). The Tika parsers would map their metadata to the keys of the
standard model most appropriate to them, and Tika would take care of the
equivalence between e.g. Dublin core and ISO 19115.

Would it make sense?

    Martin


Reply via email to