Yes there is a difference. In Nutch we have a ICU4J library in lib
directory. but there is no ICU4J lib or class file in a single tika jar
file. for example in pdfbox jar file we have this path: com.ibm.icu . but
there is no com.ibm path in a tika jar file.
How can i add ICU4J library to the tika jar file?

On Mon, Oct 31, 2011 at 10:49 PM, Robert Muir <[email protected]> wrote:

> Do you have ICU4J jar in your classpath in both situations?
>
> On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo <[email protected]>
> wrote:
> > Hello
> > When I use Tika for extracting my persian pdf files, all the characters
> will
> > be extracted vice versa. I mean that the characters showed from
> beginning of
> > the line to the end, but from left to right. However when I use Tika gui
> via
> > Nutch there is no mistake and the output text is  right-to-left !!
> >
> > Following text is the first line of attached file in first mode (running
> > Tika independently):
> >    ﻲﻠﻋ ﺎﻳ ﻮﺗ ﻝﻼﺟ ﺯﺍ ﻢﻧﺯ ﻡﺩ ﻪﻜﻧﺁ ﺕﺭﺪﻗ ﺖﺳﺍﺮﻣ ﻪﻧ ﻲﻣﺮﻜﻣ ﺩﻮﺟ ﺩﻮﺟﻭ ﻪﺑ ﺖﻤﻳﻮﮔ ﻪﻛ
> ﺖﺳﺍ
> > ﺲﺑ ﻦﻴﻤﻫ ﻪﻧ ﻱﺪﺑﻮﻣ ﺖﺨﺗ ﻪﺑ ﻱﺍ ﻩﺩﺯ ﺖﻨﻄﻠﺳ ﻪﻴﻜﺗ ﻪﻜﻧﺁ ﻲﺋﻮﺗ
> >
> > and this is in second mode (running Tika gui via Nutch) and this is a
> clear
> > persian text:
> > نه مراست قدرت آنكه دم زنم از جلال تو يا علي      نه همين بس است كه گويمت
> به
> > وجود جود مكرمي توئي آنكه تكيه سلطنت زده اي به تخت موبدي
> >
> > Thanks for your attention
> >
> >
> >
> >
> >
>
>
>
> --
> lucidimagination.com
>

Reply via email to