Thanks for the answer. The PDFBOX-586 is from myself :-)
So, as I expect to have customers in asian, and 'righ to left' countries : I will keep those files :-( (I sometimes have Out Of Memory Exception I should catch as my app. runs on mobile devices/phones). I will optimize elsewhere. Selon Jukka Zitting <[email protected]>: > Hi, > > On Mon, Aug 9, 2010 at 3:41 PM, Bernard Segonnes <[email protected]> wrote: > > I have ported PDFBox 1.1.0 on Android (only text extraction). The binary > is > > too big & too slow (probably due to memory constraints...) : around 5Mo > (9Mo > > once installed on a mobile device : too much) > > See PDFBOX-586 [1] for some related progress. > > > Are the files in : > > 1) cmap require ? (78-EUC_H Adobe-CNS-5 GBK-EUC-V > UniKS-UTF8-H > > ...) I would be please to remove all those files :-) > > These are only needed for processing PDF documents that use CJK > (Chinese, Japanese, Korean) fonts. These CMaps are needed to translate > from the internal font-specific character identification codes to > Unicode. > > > 2) pdf_*.xml are they require for text extraction ? (pdf_he_IL.xml > > pdf_zh_Hant.xml ....) > > These are part of the ICU4J library. You only need ICU4J for handling > Arabic and other right-to-left languages. > > [1] https://issues.apache.org/jira/browse/PDFBOX-586 > > BR, > > Jukka Zitting > Bernard SEGONNES ------------------------------------- [email protected] http://bsegonnes.free.fr
