On Thu, 26 Oct 2017, Chris Mattmann wrote:
My general approach to conflicting metadata is simply to define
precedence orders.
For example here is one documented from OODT:
https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence
We can do similar things with Tika, e.g.,
[CoreMetadata.PROPERTIES]
[ImageParser.METADATA]
[TikaOCR.METADATA]
What happens if two different parsers both output the same bit of metadata
though? eg Tim's example of one giving dc:creator of Tim and the second
giving dc:creator of Chris?
Secondly, what about the XHTML sax events stream? I think that's probably
the harder case...
Nick