Unknown encoding for 'GBK-EUC-H'
--------------------------------
Key: PDFBOX-612
URL: https://issues.apache.org/jira/browse/PDFBOX-612
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 0.8.0-incubator
Environment: Windows
Reporter: Gang Luo
Unknown encoding for 'GBK-EUC-H' for chinese pdf document. To fix it.
1.add method to org.apache.pdfbox.pdmodel.font.PDFont.java
public String getEncodingName() {
COSBase encoding = font.getDictionaryObject(COSName.ENCODING);
if (encoding != null) {
if (encoding instanceof COSName) {
return ((COSName) encoding).getName();
}
}
return null;
}
2.modify encode method.
from
if( retval == null && cmap != null )
{
retval = cmap.lookup( c, offset, length );
}
//if we havn't found a value yet and
//we are still on the first byte and
//there is no cmap or the cmap does not have 2 byte mappings then try
to encode
//using fallback methods.
to
if( retval == null && cmap != null )
{
String encodingStr = getEncodingName();
if (encodingStr != null) {
EncodingConverter converter =
EncodingConversionManager.getConverter(encodingStr);
if (converter != null) {
if (length == 1) return null;
retval = converter.convertBytes(c, offset, length, cmap);
} else {
retval = cmap.lookup( c, offset, length );
}
} else {
retval = cmap.lookup( c, offset, length );
}
}
//if we havn't found a value yet and
//we are still on the first byte and
//there is no cmap or the cmap does not have 2 byte mappings then try
to encode
//using fallback methods.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.