On Sep 13, 2010, at 2:42am, Nick Burch wrote:

On Sat, 11 Sep 2010, Jukka Zitting wrote:
The reason why I originally didn't want to simply catch and ignore the potential exceptions in the TikaConfig constructor was the lack of a good error reporting mechanism. The trick of insulating the external library dependencies to separate extractor classes nicely solved that problem by delaying the exceptions to the actual parse() method calls on specific document types, which obviously would then give the end user a much better idea of what's wrong.

My thinking on exceptions during creating the parser are:
* ClassNotFound for parser class - throw the exception, as the user has
 specified a parser that isn't there

* Any other ClassNotFound - warning, as this means that a dependency is
 missing, but that may be what the user wanted

If you use this approach, then you'd also want to do this special handling for the NoSuchMethodError, as that was getting thrown by Tika 0.7-SNAPSHOT when POI support was excluded.

* Any other problem - throw the exception, as there is a fault with the
 parser, and there's a fair chance that this is a customer parser
 that has broken. (The standard tika parsers shouldn't do this!)

Interesting idea - I'm worried that there will be new exceptions when future versions of Tika change their parser implementations.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to