On Thu, 15 Oct 2015, Martin Desruisseaux wrote:
So I'm not looking for a solution to a technical problem, but I'm trying to learn more about the strategic direction that Tika wishes to take. Would Tika considers to move to a richer metadata model than Dublin core?
Tika doesn't only use Dublin Core. Tika uses about half a dozen well-known externally defined metadata models (meta-metadata?). Dublinc core is one of those, but certainly not the only one
We rely on external definitions to explain what a metadata key represents, and the better known that definition the easier it is for our users. We then have the parsers map from their format-specific metadata onto the most appropriate well-known key
(By most appropriate, one example is some of the XMP bits. XMP has it's own date metadata, but we don't use those. We instead use the better known Dublin Core properties for the dates, and only media-specific parts of XMP)
Would ISO 19115 be considered too geospatial-centric (which I could understand)? Would Tika supports more than one "universal model" if it wants to preserve Dublin core simplicity with the richness of other international standards?
As mentioned above, Tika already has multiple external definitions in use, but only one for each area
Whatever we do, it needs to be easy for people to work out what they want, and what something means. If they have to read a many hundred page ISO standard to figure it out, we've failed! Ditto if it becomes an epic battle to work out what a value is / how to decode it
Nick
