On Thu, 26 Oct 2017, Chris Mattmann wrote:
My general approach to conflicting metadata is simply to define precedence orders.

For example here is one documented from OODT:

https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence

We can do similar things with Tika, e.g.,

[CoreMetadata.PROPERTIES]
[ImageParser.METADATA]
[TikaOCR.METADATA]

What happens if two different parsers both output the same bit of metadata though? eg Tim's example of one giving dc:creator of Tim and the second giving dc:creator of Chris?


Secondly, what about the XHTML sax events stream? I think that's probably the harder case...

Nick

Reply via email to