On Mon, 12 Oct 2015, Martin Desruisseaux wrote:
In the last ApacheConf in Budapest, we had some discussion about
geospatial metadata in Tika. Currently Tika has 3 properties (latitude,
longitude, altitude) in its org.apache.tika.metadata.Geographic
interface, also reproduced in the TikeCoreProperties interface.
Geospatial metadata can be more complex, but does Tika wishes to support
more geospatial metadata structures or to keep that model simple?

Both!

Currently, it's very easy for a new user of Tika to get the metadata they want out, they can just fetch a simple string value to get started with. You can, when you learn more, start getting more richly typed values out, but the quickstart is simple. Some libraries make it so that you have to learn the full rich metadata structure right from the get-go, which causes problems for new users. Whatever we do to help the power users, we need to not ruin it for the beginners!

If Tika wishes to support geospatial metadata more extensively, would Tika consider to use the ISO 19115 metadata model? This international standard is the official metadata model of the Open Geospatial Consortium (OGC) and is in use in various organisations (some parts of NASA, European Space Agency, Food and Agriculture Organisation, etc.). The ISO 19115 standard is quite big, with about 500 properties.

For the discussion on "what should a richer Tika metadata system be based on", I think TIKA-1607 is where that is taking place, plus some related threads on-list. If you have ideas/experiences/alternatives, especially ones which keep things beginner-friendly, please share them!

In the short term, if there are some key parts of that standard for geospacial metadata that we don't currently handle, and could do easily with the current setup, then we should raise a JIRA + get a sample file + add the support

Nick

Reply via email to