On Sep 13, 2010, at 2:42am, Nick Burch wrote:
On Sat, 11 Sep 2010, Jukka Zitting wrote:
The reason why I originally didn't want to simply catch and ignore
the potential exceptions in the TikaConfig constructor was the lack
of a good error reporting mechanism. The trick of insulating the
external library dependencies to separate extractor classes nicely
solved that problem by delaying the exceptions to the actual
parse() method calls on specific document types, which obviously
would then give the end user a much better idea of what's wrong.
My thinking on exceptions during creating the parser are:
* ClassNotFound for parser class - throw the exception, as the user
has
specified a parser that isn't there
* Any other ClassNotFound - warning, as this means that a dependency
is
missing, but that may be what the user wanted
If you use this approach, then you'd also want to do this special
handling for the NoSuchMethodError, as that was getting thrown by Tika
0.7-SNAPSHOT when POI support was excluded.
* Any other problem - throw the exception, as there is a fault with
the
parser, and there's a fair chance that this is a customer parser
that has broken. (The standard tika parsers shouldn't do this!)
Interesting idea - I'm worried that there will be new exceptions when
future versions of Tika change their parser implementations.
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g