Eddie, I'm also on the bouncycastle mailing list and saw they have a version which was specifically made for embedded devices. The reason a different build it required is because there are core Java files missing from the JVM (this was done intentionally to reduce overhead). If you use that jar file instead of the normal bouncycastle one, it may resolve your issues.
---- Thanks, Adam From: "Eddie B (JIRA)" <[email protected]> To: [email protected] Date: 03/05/2011 08:57 Subject: [jira] Commented: (PDFBOX-586) Text Extraction on Android [ https://issues.apache.org/jira/browse/PDFBOX-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002998#comment-13002998 ] Eddie B commented on PDFBOX-586: -------------------------------- I have modified the open source PDFRenderer code to do much the same as this code... text extraction on Android devices specifically. I have run into a limitation though, and PDFBox seems to have the same limitation, No support for encrypted documents. PDFs encrypted with either AES or RC4 are not able to be parsed. It appears to be a limitation of the ciphers that are available in the android OS. The encryption is added when password security is applied to prevent editing for example. (in Acrobat: File - Properties - Security) Has anyone had any luck opening pdfs with AES or RC4 encyption on an Android device? I will try to post some small pdfs here for testing. > Text Extraction on Android > -------------------------- > > Key: PDFBOX-586 > URL: https://issues.apache.org/jira/browse/PDFBOX-586 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction > Affects Versions: 1.1.0 > Environment: Windows XP + Eclipse + PDFBox sources > Reporter: Bernard > Attachments: ASEB-Camping_Car_ou_Bateau.pdf, EncryptedFileTest_AES.pdf, EncryptedFileTest_RC4.pdf, Eval.pdf, PDFBOX586-ASEB-Camping_Car_ou_Bateau.txt, PDFBOX586-Eval.txt, PDFBOX586-internals.txt, TestPDFBox.zip, internals.pdf > > > Hi, > I have noticed that I can extract text some PDF files in PDFBox 0.7.4 but for the same file, the same page, PDFBox 1.1.0 doesn't retreive any text, or the extraction is worst. > Am I the only only one who think there is a regression in text extraction ? > My code is like this : > PDDocument document = PDDocument.load("/sdcard/internals.pdf"); > int numberOfPages = document.getNumberOfPages(); > resources = this.getResources(); > > android.util.Log.d(TEST_PDFBOX, "readerPDF() resources : "+resources); // ANDROID code here to get file > resourceGlyphList = R.raw.glyphlist; > InputStream rawResource = resources.openRawResource(R.raw.pdftextstripper); // PDFBOX property file > android.util.Log.d(TEST_PDFBOX, "readerPDF() rawResource : "+rawResource); > Properties properties = new Properties(); > properties.load(rawResource); > > PDFTextStripper stripper = new PDFTextStripper(properties ); > > stripper.setStartPage(pageNumber ); // 1 or any other page > stripper.setEndPage(pageNumber ); // same page as above > String s = "Page : "+pageNumber+"<br><br>"+stripper.getText(document); > android.util.Log.d(TEST_PDFBOX, "readerPDF() stripper extract pages text : "+s); > Maybe I should use page.getContents().getStream() or stripper.getTextForRegion( "class1" ) or stripper.writeText(doc, outputStream) > I want the text as a String, not as a newly created file.... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - FHA 203b; 203k; HECM; VA; USDA; Conventional - Warehouse Lines; FHA-Authorized Originators - Lending and Servicing in over 45 States www.swmc.com - www.simplehecmcalculator.com Visit www.swmc.com/resources for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
