[
https://issues.apache.org/jira/browse/PDFBOX-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980882#comment-14980882
]
Joel Hirsh commented on PDFBOX-3067:
------------------------------------
Ok, I think we have been confusing two different things. And I believe
there still is a problem.
I am overiding writeString, not processTextPosition. PrintTextLocations
for 2.0 has writeString, for 1.8 has processTextPostion, and neither seems
to show the strings rather than positions. I have modified the
PrintTextLocations code slightly to show strings rather than each text
position, and also added widthofspace, which is not printed out in the
trunk version.
So my modified code for PrintTextLocations.writeString is as follows
(original lines commented out)
TextPosition text = textPositions.get(0);
// for (TextPosition text : textPositions)
// {
System.out.println( string + " @[" + text.getXDirAdj()
+ "," +
text.getYDirAdj() + " type=" +
text.getFont().getSubType() + " fs=" + text.getFontSize() + " xscale=" +
text.getXScale() + " height=" + text.getHeightDir() + "
space=" +
text.getWidthOfSpace() + " width=" +
text.getWidthDirAdj() + " widthofspace=" +
text.getWidthOfSpace() + "]" );
// }
I also moved that code back into 1.8.9 PrintTextLocations
In 1.8 I get
-200,000.00 @[363.94,491.34 type=TrueType fs=-12.0 xscale=7.44
height=4.6537204 space=4.464 width=4.4639893 widthofspace=4.464]
-200,000.00 @[363.94,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=4.6537204 space=4.464 width=4.4639893 widthofspace=4.464]
In the latest 2.0 I get
- @[363.94,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
2 @[368.404,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[372.86798,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[377.33197,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
, @[381.79596,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[386.25995,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[390.72394,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[395.18793,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
. @[399.65192,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[404.1159,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[408.5799,491.34 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
- @[363.94,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
2 @[368.404,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[372.86798,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[377.33197,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
, @[381.79596,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[386.25995,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[390.72394,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[395.18793,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
. @[399.65192,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[404.1159,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
0 @[408.5799,522.33997 type=TrueType fs=-12.0 xscale=7.44
height=6.3265433 space=-4.464 width=4.4639893 widthofspace=-4.464]
And I have verified that the only build I have is today's, rebuilt the
project, etc.
On Thu, Oct 29, 2015 at 10:03 AM, Tilman Hausherr (JIRA) <[email protected]>
> Text strings being returned as single characters, regression from version 1.8
> -----------------------------------------------------------------------------
>
> Key: PDFBOX-3067
> URL: https://issues.apache.org/jira/browse/PDFBOX-3067
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Joel Hirsh
> Assignee: Tilman Hausherr
> Labels: regression
> Fix For: 2.0.0
>
> Attachments: singlecharacters.pdf
>
>
> PrintTextLocations writestring() is returning individual characters on this
> file, rather than a complete string. Was returning strings with '-200,000'
> in version 1.8
> Also note that textposition.getWidthOfSpace() is getting a negative value
> (-4.464) for each character. Don't know if that is symptom or a cause.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]