On Mon, 12 Oct 2015, Martin Desruisseaux wrote:
In the last ApacheConf in Budapest, we had some discussion about
geospatial metadata in Tika. Currently Tika has 3 properties (latitude,
longitude, altitude) in its org.apache.tika.metadata.Geographic
interface, also reproduced in the TikeCoreProperties interface.
Geospatial metadata can be more complex, but does Tika wishes to support
more geospatial metadata structures or to keep that model simple?
Both!
Currently, it's very easy for a new user of Tika to get the metadata they
want out, they can just fetch a simple string value to get started with.
You can, when you learn more, start getting more richly typed values out,
but the quickstart is simple. Some libraries make it so that you have to
learn the full rich metadata structure right from the get-go, which causes
problems for new users. Whatever we do to help the power users, we need to
not ruin it for the beginners!
If Tika wishes to support geospatial metadata more extensively, would
Tika consider to use the ISO 19115 metadata model? This international
standard is the official metadata model of the Open Geospatial
Consortium (OGC) and is in use in various organisations (some parts of
NASA, European Space Agency, Food and Agriculture Organisation, etc.).
The ISO 19115 standard is quite big, with about 500 properties.
For the discussion on "what should a richer Tika metadata system be based
on", I think TIKA-1607 is where that is taking place, plus some related
threads on-list. If you have ideas/experiences/alternatives, especially
ones which keep things beginner-friendly, please share them!
In the short term, if there are some key parts of that standard for
geospacial metadata that we don't currently handle, and could do easily
with the current setup, then we should raise a JIRA + get a sample file +
add the support
Nick