Then you should open a bug report on TIKA, providing them your files that do
not parse. Often the problem is in some of TIKA's underlying parser libs
like Apache POI, then there is nothing they can do. Maybe another TIKA issue
handles about the same problem, just search the issue tracker!

 

Uwe

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Deepak Singh [mailto:[email protected]] 
Sent: Wednesday, March 09, 2011 2:09 PM
To: [email protected]
Subject: Re: Solr Exception

 


downloaded apache-solr-3.1 still it giving TIKA Exception

On Wed, Mar 9, 2011 at 5:11 PM, Deepak Singh <[email protected]> wrote:

oh, thanks for the better solution.

 

On Wed, Mar 9, 2011 at 4:36 PM, Uwe Schindler <[email protected]> wrote:

Hi,

 

These are all bugs in Apache TIKA not Solr, some of them are already fixed
in later TIKA versions (so you may try the soon-to-be-released Solr 3.1
version which contains a newer TIKA bundled).

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: [email protected]

 

From: Deepak Singh [mailto:[email protected]] 
Sent: Wednesday, March 09, 2011 12:03 PM
To: [email protected]
Subject: Re: Solr Exception

 


HTTP ERROR :500 (INTERNAL SERVER ERROR)

For DOC files:
org.apache.tika.exception.

TikaException :
-Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: org.apache.poi.hpsf.IllegalPropertySetDataException: The property
set claims to have a size of 16 bytes. However, it exceeds 16 bytes.

-TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@1248f2
Caused by: java.io.IOException: block[ 0 ] already removed - does your POIFS
have circular or duplicate block references?


For PDF files:
org.apache.tika.exception.TikaException : 
-Unexpected RuntimeException from org.apache.tika.parser.Pdfparser@1b4cd65
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be
cast to org.pdfbox.cos.COSDictionar
Caused by: java.lang.NullPointerException

 

-Unable to extract PDF content

HTTP ERROR:400 (BAD REQUEST)
-This error come when some fields are missing
ERROR:unknown field 'language' (Ex:content_status, description,version)

 

On Wed, Mar 9, 2011 at 4:19 PM, Gora Mohanty <[email protected]> wrote:

Hi,

This is probably better directed to the user list. Also, please provide
details of the exceptions from your log files.

Regards,
Gora

 

 

 

Reply via email to