Hi, I have ported PDFBox 1.1.0 on Android (only text extraction). The binary is too big & too slow (probably due to memory constraints...) : around 5Mo (9Mo once installed on a mobile device : too much)
So I'm looking for files I can delete.... I only need to extract text. Are the files in : 1) cmap require ? (78-EUC_H Adobe-CNS-5 GBK-EUC-V UniKS-UTF8-H ...) I would be please to remove all those files :-) 2) pdf_*.xml are they require for text extraction ? (pdf_he_IL.xml pdf_zh_Hant.xml ....) 3) other resoucres file I can remove ? Thanks for the help.
