Re: Tika 0.7 And Solr

Ken Krugler Wed, 07 Jul 2010 10:05:01 -0700

Hi Rohan,

On Jul 7, 2010, at 4:01am, rohanpatil wrote:

I am using Solr provided by lucidimagination and it has tika 0.5 anduses

pdfbox 0.8.
And it has problems extracting content from large(>200kb) v1.5 PDFs.

I saw that pdfbox 1.x resolves this issue.
When i upgraded the extraction jars i got the following errors.

Jul 7, 2010 2:38:56 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider


Back in January I'd run into the same issue:

I believe the issue is that the PDFBox pom.xml declares thedependency on the missing BouncyCastleProvider jar as "optional".
   <dependency>
     <groupId>bouncycastle</groupId>
     <artifactId>bcprov-jdk14</artifactId>
     <version>136</version>
     <optional>true</optional>
   </dependency>
As explained in the Maven documentation, this means that Tika needsto explicitly include the jar:
http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
I see a few other optional dependencies in the PDFBox pom.xml, butperhaps the only one that's really critical is the above.
Let me know if anybody else has input on this, otherwise I'll filean issue and fix it.


To fix it, you could manually install the bcprov-jdk14.jar

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Tika 0.7 And Solr

Reply via email to