Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "CompositeParserDiscussion" page has been changed by NickBurch: https://wiki.apache.org/tika/CompositeParserDiscussion?action=diff&rev1=4&rev2=5 Comment: Hierarchy question == Fastest == If there are two parsers, use the faster one even if it might mean lower quality (eg avoid OCR) + + = Mime type hierarchies = + Consider a case like: + + * application/vnd.ms-excel + * application/x-tika-msoffice + + Or + + * application/dita+xml;format=concept + * application/dita+xml;format=topic + * application/dita+xml + + If there were two parsers available for application/vnd.ms-excel, and another for application/x-tika-msoffice, should it be possible to specify in a strategy that a parser for the parent type also be used? Should it be possible to set a strategy like "use the dita concept, then the general dita, then the dita topic", hopping around up and down the hierarchy? + + Or do we keep the current behaviour where once a point in the hierachy with a parser is found, it is parsed at that point? = Allowing the User to select a strategy = The right strategy for one user may not be the right for another. The right strategy for one file may not be the right one for another. We therefore need to allow users to pick their strategy, on an overall basis, and on a per-file basis
