[ 
https://issues.apache.org/jira/browse/PDFBOX-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024757#comment-15024757
 ] 

Guillaume Monteils edited comment on PDFBOX-3024 at 11/24/15 4:51 PM:
----------------------------------------------------------------------

Sorry on the "missing glyph" GID. Not really a glyph expert, but even google 
did not help me :(

For the validation process, the error occurs when validating content, and 
exactly, in validateText in PreflightContentStream.

While reading stream, we check glyph length of the character "cid=0". Perhaps 
that we should not check cid=0 character glyph length.

{code}
        InputStream in = new ByteArrayInputStream(string);
        while (in.available() > 0)
        {
            try
            {
                int code = font.readCode(in);
                fontContainer.checkGlyphWidth(code);
                ...
{code}

I also want to add that when i delete this HasGlyph test on character "code=0, 
cid=0, gid=0", a width is present for the character in the font and different 
from 0.

{code}
            // check widths
            float expectedWidth = font.getWidth(code); => return 778
            float foundWidth = font.getWidthFromFont(code); => return 777
{code}

Here is the COS dictionnary of the font :

COSDictionary{(COSName{BaseFont}:COSName{XYNCVU+TimesNewRomanPSMT}) 
(COSName{CIDSystemInfo}:COSDictionary{(COSName{Ordering}:COSString{Identity}) 
(COSName{Registry}:COSString{Adobe}) (COSName{Supplement}:COSInt{0}) }) 
(COSName{CIDToGIDMap}:COSName{Identity}) (COSName{DW}:COSInt{1000}) 
(COSName{FontDescriptor}:COSDictionary{(COSName{Ascent}:COSInt{1007}) 
(COSName{CIDSet}:COSStream{(COSName{Filter}:COSName{FlateDecode}) 
(COSName{Length}:COSInt{11}) }) (COSName{CapHeight}:COSInt{663}) 
(COSName{Descent}:COSInt{-307}) (COSName{Flags}:COSInt{6}) 
(COSName{FontBBox}:COSArray{[COSInt{-568}, COSInt{-307}, COSInt{2000}, 
COSInt{1007}]}) (COSName{FontFamily}:COSString{Times New Roman}) 
(COSName{FontFile2}:COSStream{(COSName{Filter}:COSName{FlateDecode}) 
(COSName{Length}:COSInt{8506}) (COSName{Length1}:COSInt{23085}) }) 
(COSName{FontName}:COSName{XYNCVU+TimesNewRomanPSMT}) 
(COSName{FontStretch}:COSName{Normal}) (COSName{FontWeight}:COSInt{400}) 
(COSName{ItalicAngle}:COSInt{0}) (COSName{StemV}:COSInt{80}) 
(COSName{Type}:COSName{FontDescriptor}) (COSName{XHeight}:COSInt{448}) }) 
(COSName{Subtype}:COSName{CIDFontType2}) (COSName{Type}:COSName{Font}) 
(COSName{W}:COSArray{[COSInt{0}, COSArray{[COSInt{778}]}]}) }


was (Author: ikikrepus):
Sorry on the "missing glyph" GID. Not really a glyph expert, but even google 
did not help me :(

For the validation process, the error occurs when validating content, and 
exactly, in validateText in PreflightContentStream.

While reading stream, we check glyph length of the character "cid=0". Perhaps 
that we should not check cid=0 character glyph length.

{code}
        InputStream in = new ByteArrayInputStream(string);
        while (in.available() > 0)
        {
            try
            {
                int code = font.readCode(in);
                fontContainer.checkGlyphWidth(code);
                ...
{code}

I also want to add that when i delete this HasGlyph test on character "code=0, 
cid=0, gid=0", a width is present for the character in the font and different 
from 0.

{code}
            // check widths
            float expectedWidth = font.getWidth(code); => return 778
            float foundWidth = font.getWidthFromFont(code); => return 777
{code}

> Preflight validation call PDType0Font.clear at the wrong time
> -------------------------------------------------------------
>
>                 Key: PDFBOX-3024
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3024
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.8.10
>            Reporter: Guillaume Monteils
>         Attachments: 004973.pdf, PDF-Tools.png, PDFBox.png, eclipse-1.jpg, 
> eclipse-2.jpg
>
>
> I used the algorythm here to test PDF / A compliance :
> https://pdfbox.apache.org/1.8/cookbook/pdfavalidation.html
> With one pdf document (which i cant give you due to confidentiality), an 
> NullPointerException occur here :
> {code}
> java.lang.NullPointerException
>       at 
> org.apache.pdfbox.pdmodel.font.PDType0Font.getFontWidth(PDType0Font.java:188)
>       at 
> org.apache.pdfbox.preflight.font.container.FontContainer.checkGlyphWith(FontContainer.java:114)
>       at 
> org.apache.pdfbox.preflight.content.ContentStreamWrapper.validText(ContentStreamWrapper.java:372)...
> {code}
> As i dug deeper, i found that preflight loads a font context where it puts 
> all pdf fonts. The PDType0Font is also created and put in this context.
> {code}
> (CSObject : 
> COSDictionary{(COSName{BaseFont}:COSName{INWHIX+TimesNewRomanPSMT})       
> (COSName{DescendantFonts}:COSArray{[COSObject{349, 0}]}) 
> (COSName{Encoding}:COSName{Identity-H})       
> (COSName{Subtype}:COSName{Type0}) 
> (COSName{ToUnicode}:COSDictionary{(COSName{Filter}:COSName{FlateDecode})      
> (COSName{Length}:COSInt{260}) }) (COSName{Type}:COSName{Font}) })
> {code}
> The problem is that at the end of one step of the analysis, the clear method 
> is called on the PDType0Font (see eclipse-1.jpg), but the font is still 
> present in the context. On a second step, the same font is retrieved from the 
> context, with no data in it, and the NullPointerException occurs (see 
> eclipse-2.jpg).
> I tried the validation after removing the clear method from PDType0Font and 
> it works just fine.
> I think the problem comes from this context, and a clear on a font should 
> also trigger a deletion in this map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to