Thanks for your question Mohamed, feel free to send these
types of questions to [email protected]. It would be a
great place to ask them and tell your classmates too.

I'm copying the list on this message.

(BTW you can then find the mail in Google and other
mail archives after that)

Sometimes the MIME type is incorrectly detected, and
the best bet is to file a JIRA issue here in Tika:

https://issues.apache.org/jira/browse/TIKA

and then attach the sample PDF file for testing.

If you have to preprocess a file in your specific
assignment in CS572, that's fine too you can just
force it to automatically call the PDF parser by
calling it directly from your program or Java code
and then bypass that step.

HTH!

Cheers,
Chris


------------------------
Chris Mattmann
[email protected]




-----Original Message-----
From: Mohamed Mustafa Rafik Khimani <[email protected]>
Date: Wednesday, February 19, 2014 12:56 PM
To: Chris Mattmann <[email protected]>
Subject: CSCI ASSIGNMENT QUESTION

>Hello Professor Mattmann,
>I have a doubt regarding the Tika assignment. I was trying to read one of
>the pdf files downloaded from the vault. I was unable to read the file
>using Tika class and the parse method, which was returning null for each
>line.
>
>When I tried to use the detect method, to check the Mime type of the
>file, it returns audio/mpeg.
>
>I tried using one of the known pdf files, which returned the correct mime
>type as  well as was able to parse the file correctly.
>
>I wanted to confirm if I need to pre-process the file in anyway before I
>can extract the contents or if there might be a potential issue with the
>pdf files that I have downloaded, and may be consider re-downloading them
>?
>
>I am following the Tika in Action book. I have read the first 4 chapters
>and will be reading the content extraction chapter next. I was trying a
>few things while reading the text, so thought of asking you if this is
>expected or if I am going wrong somewhere.
>
>Thank you for your time.
>
>Sincerely,
>
>Mohamed Mustafa Khimani
>


Reply via email to