On Tue, Nov 1, 2011 at 6:24 AM, Ahmad Ajiloo <[email protected]> wrote:
> Yes there is a difference. In Nutch we have a ICU4J library in lib
> directory. but there is no ICU4J lib or class file in a single tika jar
> file. for example in pdfbox jar file we have this path: com.ibm.icu . but
> there is no com.ibm path in a tika jar file.
> How can i add ICU4J library to the tika jar file?
>

I really think tika should include the parts of icu4j it depends on.
Often open source projects are hesitant to include icu jar because of
its size, but thats silly since the size is just a catch-all.
We can use the webapp to make a smaller one that includes the minimum
of stuff Tika needs. http://apps.icu-project.org/datacustom/

Maybe we should open a JIRA issue to fix this? I think its a bug that
Arabic and Persian text silently come out corrupted if you don't have
this in your classpath.


-- 
lucidimagination.com

Reply via email to