Hi Jukka,

On Sep 10, 2010, at 5:35am, Jukka Zitting wrote:

Hi,

On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler
<[email protected]> wrote:
With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and instantiates all Parser-based classes found on the classpath. Which, as
expected, triggers a storm of Exceptions and Errors.

Which errors are you seeing? In TIKA-378 [1] I tried to make the
TikaConfig(Classpath) behave better in such situations by making many
of our Parser classes loadable even when the respective parser library
is not available (I usually moved the direct class dependencies to a
separate Extractor class). I'm not sure how well that work has
survived recent changes in trunk.

Here's the stack trace.

<error>java.lang.NoClassDefFoundError: org/apache/poi/poifs/ filesystem/DirectoryEntry at org.apache.tika.parser.microsoft.OfficeParser.&lt;clinit&gt; (OfficeParser.java:55)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at sun.misc.Service$LazyIterator.next(Service.java:271)
        at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:170)
        at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:189)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java: 268) at org.apache.tika.parser.AutoDetectParser.&lt;init&gt; (AutoDetectParser.java:51)

[1] https://issues.apache.org/jira/browse/TIKA-378

The issue is that the definitions of the types that are supported come from POI:

Collections.unmodifiableSet(new HashSet<MediaType>(Arrays.asList(
                POIFSDocumentType.WORKBOOK.type,
                POIFSDocumentType.OLE10_NATIVE.type,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to