[
https://issues.apache.org/jira/browse/PDFBOX-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson closed PDFBOX-316.
------------------------------
Resolution: Cannot Reproduce
> Extracting number show empty string
> -----------------------------------
>
> Key: PDFBOX-316
> URL: https://issues.apache.org/jira/browse/PDFBOX-316
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1818588
> Originally submitted by astonishing1 on 2007-10-23 07:20.
> hi,
> I want to extract the text which is a number 10 digit long and is at fix
> place on each page of PDF file.
> I used PrintTextLocations & PDFTextStripper to extrac t that id number
> from the PDFs .
> The PDF is Arabic but I want the number to extract only.
> The problem is that when I use PrintTextLocations utility when it prints the
> number it always misses one or two numbers and insert empty space instead of
> that numbers.
> Example
> String[730.10004,116.75003 ft=Times-New-Roman+2 fs=200.0 xscale=0.05
> height=5.000001 width=911.2002] text: 16/10/2007
> String[775.7,32.75 ft=Times-New-Roman-Bold+1 fs=200.0 xscale=0.05
> height=5.000001 width=933.4004] text: RBKPI011
> String[786.15,116.75003 ft=Times-New-Roman-Bold+1 fs=200.0 xscale=0.05
> height=5.000001 width=739.0] text:????? ?????
> String[375.85,89.10004 ft=Times-New-Roman-Bold+1 fs=240.0 xscale=0.05
> height=6.000001 width=1057.6797] text:?????? - 004
> String[330.9,101.10004 ft=Times-New-Roman-Bold+1 fs=240.0 xscale=0.05
> height=6.000001 width=3023.04] text:?????? ??? ???? ???? - 1 4 58
> (the number is 194758, 9 & 7 is missing)
> The last number is some Arabic word after â is this 194758 number but 9
> and 7 is missing
> Similarly as the big PDF file is generated daily so I parsed the new one as
> following
> String[329.75,101.10004 ft=Times-New-Roman-Bold+1 fs=240.0 xscale=0.05
> height=6.000001 width=3068.6406]?????? ???? ?? ???? - 1 06 14 No.is
> 1906914, 9 is missing)
> So it is not fixed .
> So can anyone help ,tanks in advance .
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1818588&file_id=251007
> 194758.pdf (application/pdf), 103183 bytes
> pdf file to extract data using PrintTextLocations utility
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)