Hi, On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir <[email protected]> wrote: > I really think tika should include the parts of icu4j it depends on. > Often open source projects are hesitant to include icu jar because of > its size, but thats silly since the size is just a catch-all. > We can use the webapp to make a smaller one that includes the minimum > of stuff Tika needs. http://apps.icu-project.org/datacustom/
We need a version that's available on the central Maven repository. > Maybe we should open a JIRA issue to fix this? I think its a bug that > Arabic and Persian text silently come out corrupted if you don't have > this in your classpath. +1 BR, Jukka Zitting
