[jira] Created: (PDFBOX-612) Unknown encoding for 'GBK-EUC-H'

Gang Luo (JIRA) Sun, 07 Feb 2010 16:35:52 -0800

Unknown encoding for 'GBK-EUC-H'
--------------------------------

                 Key: PDFBOX-612
                 URL: https://issues.apache.org/jira/browse/PDFBOX-612
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 0.8.0-incubator
         Environment: Windows
            Reporter: Gang Luo



Unknown encoding for 'GBK-EUC-H' for chinese pdf document. To fix it.

1.add method to org.apache.pdfbox.pdmodel.font.PDFont.java

public String getEncodingName() {
        COSBase encoding = font.getDictionaryObject(COSName.ENCODING);
        if (encoding != null) {
            if (encoding instanceof COSName) {
                return ((COSName) encoding).getName();
            }
        }
        return null;
    }

2.modify  encode method.
from
        if( retval == null && cmap != null )
        {
                retval = cmap.lookup( c, offset, length );
        }
        //if we havn't found a value yet and
        //we are still on the first byte and
        //there is no cmap or the cmap does not have 2 byte mappings then try 
to encode
        //using fallback methods.

to

        if( retval == null && cmap != null )
        {
            String encodingStr = getEncodingName();
            if (encodingStr != null) {
                EncodingConverter converter = 
EncodingConversionManager.getConverter(encodingStr);
                if (converter != null) {
                    if (length == 1) return null;
                    retval = converter.convertBytes(c, offset, length, cmap);
                } else {
                    retval = cmap.lookup( c, offset, length );
                }
            } else {
                retval = cmap.lookup( c, offset, length );
            }
        }
        //if we havn't found a value yet and
        //we are still on the first byte and
        //there is no cmap or the cmap does not have 2 byte mappings then try 
to encode
        //using fallback methods.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PDFBOX-612) Unknown encoding for 'GBK-EUC-H'

Reply via email to