Got it. Thanks! On Wed, Feb 18, 2015 at 3:07 PM, Tyler Palsulich <[email protected]> wrote:
> Please see NUTCH-1925 for the current status of upgrading Tika to version > 1.7. The current released version of Nutch uses Tika 1.6. > > You can try applying the patch there (v2 for 1.x versions) or checking out > trunk. > > Tyler > > On Wed, Feb 18, 2015 at 6:00 PM, Jiaxin Ye <[email protected]> wrote: > >> Hi Tyler, >> >> Is there anyway to test if newest version of tika is working on Nutch or >> not? >> >> >> On Wednesday, February 18, 2015, Tyler Palsulich <[email protected]> >> wrote: >> >>> If you have gdal and Tesseract installed locally, they will be run >>> against (eligible) parsed files in Tika. There shouldn't be any required >>> configuration on the Nutch side. >>> >>> Please see http://wiki.apache.org/tika/TikaOCR and >>> http://wiki.apache.org/tika/TikaGDAL for how to install/run them. >>> >>> Hope that helps, >>> Tyler >>> >>> On Wed, Feb 18, 2015 at 5:24 PM, Nikunj Gala <[email protected]> wrote: >>> >>>> The current source of Nutch uses Tika 1.7 as per repository in github. ( >>>> https://github.com/apache/nutch/commit/3e2e688bd097727f457f1aa882c74a128f0a53da >>>> ) >>>> As per Apache Tika 1.7 webpage, Tika 1.7 includes GDAL and Tesseract >>>> OCR (installation required). >>>> But the Nutch source does not have GDAL and Tesseract OCR in parse-tika >>>> plugin. >>>> >>>> How to include GDAL and Tesseract OCR sources in Tika plugin for Nutch? >>>> >>> >>> >

