Got it. Thanks!

On Wed, Feb 18, 2015 at 3:07 PM, Tyler Palsulich <[email protected]>
wrote:

> Please see NUTCH-1925 for the current status of upgrading Tika to version
> 1.7. The current released version of Nutch uses Tika 1.6.
>
> You can try applying the patch there (v2 for 1.x versions) or checking out
> trunk.
>
> Tyler
>
> On Wed, Feb 18, 2015 at 6:00 PM, Jiaxin Ye <[email protected]> wrote:
>
>> Hi Tyler,
>>
>> Is there anyway to test if newest version of tika is working on Nutch or
>> not?
>>
>>
>> On Wednesday, February 18, 2015, Tyler Palsulich <[email protected]>
>> wrote:
>>
>>> If you have gdal and Tesseract installed locally, they will be run
>>> against (eligible) parsed files in Tika. There shouldn't be any required
>>> configuration on the Nutch side.
>>>
>>> Please see http://wiki.apache.org/tika/TikaOCR and
>>> http://wiki.apache.org/tika/TikaGDAL for how to install/run them.
>>>
>>> Hope that helps,
>>> Tyler
>>>
>>> On Wed, Feb 18, 2015 at 5:24 PM, Nikunj Gala <[email protected]> wrote:
>>>
>>>> The current source of Nutch uses Tika 1.7 as per repository in github. (
>>>> https://github.com/apache/nutch/commit/3e2e688bd097727f457f1aa882c74a128f0a53da
>>>> )
>>>> As per Apache Tika 1.7 webpage, Tika 1.7 includes GDAL and Tesseract
>>>> OCR (installation required).
>>>> But the Nutch source does not have GDAL and Tesseract OCR in parse-tika
>>>> plugin.
>>>>
>>>> How to include GDAL and Tesseract OCR sources in Tika plugin for Nutch?
>>>>
>>>
>>>
>

Reply via email to