[ 
https://issues.apache.org/jira/browse/PDFBOX-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922439#action_12922439
 ] 

Mark Looi commented on PDFBOX-846:
----------------------------------

Thanks Andreas. Hey, do you think you'll be able to post the .NET version
soon? That's the one we use. Much appreciated.

Mark.
Phone: (425) 941 2378 | twitter.com/marklooi | www.looiconsulting.com


On Sat, Oct 16, 2010 at 10:46 AM, Andreas Lehmkühler (JIRA) <[email protected]



> TextExtraction mixes case of text
> ---------------------------------
>
>                 Key: PDFBOX-846
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-846
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.1
>         Environment: Windows server, .NET
>            Reporter: Mark Looi
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.3.0
>
>         Attachments: PDFBOX846-Menu_WA_032509.pdf, 
> PDFBOX846-Menu_WA_032509.txt
>
>
> Using Text extraction on a file like this, 
> http://www.organictogo.com/pdf/catering/Menu_WA_032509.pdf, the text (in all 
> CAPS) "THAI VEGGIE WRAP" is extracted as:
> "ThAI VeGGIe wRAP". However, examining the PDF, shows that it looks like 
> this: "Thai V eggi e Wrap". The related text on the next lines, such as 
> "Crisp red cabbage, cucumbers, carrots and lettuce with Thai" parse in just 
> fine.
> We are using this code to get the text in C#:
>  byte[] pdfData = myWebClient.DownloadData(pdfUrl);
>                     string text = string.Empty;
>                     ByteArrayInputStream stream = new 
> ByteArrayInputStream(pdfData);
>                     PDDocument doc = PDDocument.load(stream);
>                     PDFTextStripper stripper = new PDFTextStripper();
>                     text = stripper.getText(doc);
>                     doc.close();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to