Please see NUTCH-1925 for the current status of upgrading Tika to version
1.7. The current released version of Nutch uses Tika 1.6.

You can try applying the patch there (v2 for 1.x versions) or checking out
trunk.

Tyler

On Wed, Feb 18, 2015 at 6:00 PM, Jiaxin Ye <[email protected]> wrote:

> Hi Tyler,
>
> Is there anyway to test if newest version of tika is working on Nutch or
> not?
>
>
> On Wednesday, February 18, 2015, Tyler Palsulich <[email protected]>
> wrote:
>
>> If you have gdal and Tesseract installed locally, they will be run
>> against (eligible) parsed files in Tika. There shouldn't be any required
>> configuration on the Nutch side.
>>
>> Please see http://wiki.apache.org/tika/TikaOCR and
>> http://wiki.apache.org/tika/TikaGDAL for how to install/run them.
>>
>> Hope that helps,
>> Tyler
>>
>> On Wed, Feb 18, 2015 at 5:24 PM, Nikunj Gala <[email protected]> wrote:
>>
>>> The current source of Nutch uses Tika 1.7 as per repository in github. (
>>> https://github.com/apache/nutch/commit/3e2e688bd097727f457f1aa882c74a128f0a53da
>>> )
>>> As per Apache Tika 1.7 webpage, Tika 1.7 includes GDAL and Tesseract OCR
>>> (installation required).
>>> But the Nutch source does not have GDAL and Tesseract OCR in parse-tika
>>> plugin.
>>>
>>> How to include GDAL and Tesseract OCR sources in Tika plugin for Nutch?
>>>
>>
>>

Reply via email to