[
https://issues.apache.org/jira/browse/PDFBOX-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved PDFBOX-30.
---------------------------------
Resolution: Duplicate
As mentioned above, most of these issues seem to already have been fixed.
> some code bugs
> ---------------
>
> Key: PDFBOX-30
> URL: https://issues.apache.org/jira/browse/PDFBOX-30
> Project: PDFBox
> Issue Type: Bug
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1084937
> Originally submitted by jinfeng_wang on 2004-12-13 22:50.
> hi, ben , recently i am read your code. after my test,
> the following bugs are found in the PDFBox 0.6.7 release
> version.
> 1) code in org.pdfbox.cmaptypes.CMap:
> the algorithm of computing "key" in the method
> of "lookup" and "addMapping" are out of consistent
> when the lenth is 2.
> the code in "lookup" :
> int intKey = (code[offset]+256)%256;
> intKey <<= 8;
> intKey += (code[offset+1]+256)%256;
> key = new Integer( intKey );
> the code in "addMapping":
> int intSrc = src[0];
> intSrc <<= 8;
> intSrc |= (src[1]&0xFF);
> doubleByteMappings.put( new Integer( intSrc ),
> dest );
>
> 2) code in org.pdfbox.pdmodel.font.PDFont.encode():
> when the PDF file contains the "ToUnicode" CMap, it
> will try to parse the "ToUnicode" CMap every time for
> each character. this CMap file is not be stored for the
> later using.
> the code is:
> COSStream toUnicode =
> (COSStream) font.getDictionaryObject(
> COSName.getPDFName("ToUnicode"));
> if (toUnicode != null) {
> parseCmap(toUnicode.getUnfilteredStream(),
> null);
> 3) code in org.pdfbox.pdmodel.font.PDFont.encode():
> when the current Font is Type0, it should parse TWO
> cmaps files according to the PDF Reference. but it is
> neglated in the release version.
> 4). code in org.pdfbox.cmapparser.CMapParser.parse().
> the code has neglated the " if (op.getOperation
> ().equals(BEGIN_CID_RANGE))" which is very important
> for "Type0" font.
> 5) code in org.pdfbox.cmapparser.CMapParser.equal().
> i have downloaded the CVS code from sourceforge.
> this function is renamed to "lessThanOrEqual".
> however , i found that there is some bug withe
> this "lessThanOrEqual" member function. when i try to
> parse the CMap of "UniCNS-UCS2-H"
> with "lessThanOrEqual", the return value will be always
> TRUE, so the "while (!equals(startBytes, endBytes))" will
> be not termintad at all.
> 6) the text of the PDF file in the attachment can not be
> extract correctly both in the last release version and
> the CVS development version.
> would you please to tell me the algorithm for
> the "blank space" more in detail?
> i have noticed that the class of "TextPosition" is
> changed in the development version comparing to the
> relese version. now i have comment out the "for loop"
> in the "PDFTextStripper.flush()" in the release version,
> and the running speed is OK when extract "J2EE
> tutorial". :-)
> would you please to tell me more about the algorithm,
> thanks.
> btw, if you like, i will email to you the code about the
> extract "Type0" Font.
>
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> thanks for the report I will take a look at these. Many bugs
> have been fixed since the 0.6.7 release, please try to use the
> nightly release.
> Ben
> [comment on SourceForge]
> Originally sent by jinfeng_wang.
> Logged In: YES
> user_id=1145721
> uploaded the error pdf file.
> [comment on SourceForge]
> Originally sent by jinfeng_wang.
> Logged In: YES
> user_id=1145721
> sorry, i have not upload the "Error" PDf file .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.