On collision, the precedence order defines what key takes precedence and _overwrites_ the other. Overwrite is but one option (you could save *all* the values it’s a multi-valued key structure so…)
Cheers, Chris On 10/26/17, 9:43 AM, "Nick Burch" <[email protected]> wrote: On Thu, 26 Oct 2017, Chris Mattmann wrote: > My general approach to conflicting metadata is simply to define > precedence orders. > > For example here is one documented from OODT: > > https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence > > We can do similar things with Tika, e.g., > > [CoreMetadata.PROPERTIES] > [ImageParser.METADATA] > [TikaOCR.METADATA] What happens if two different parsers both output the same bit of metadata though? eg Tim's example of one giving dc:creator of Tim and the second giving dc:creator of Chris? Secondly, what about the XHTML sax events stream? I think that's probably the harder case... Nick
