PDColorspaceFactory does not recognize colorspace DeviceGray (patch included 
herein)
------------------------------------------------------------------------------------

                 Key: PDFBOX-981
                 URL: https://issues.apache.org/jira/browse/PDFBOX-981
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.5.0
            Reporter: Matt England


I was trying to use PDFTextStripper to extract text from a large corpus of PDF 
files. In some of them, the method:

org.apache.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace( 
COSBase colorSpace, Map colorSpaces )

fails to recognize the case when the colorSpace argument is of type COSArray 
and the array's (first) element corresponds to COSName.DEVICEGRAY. Adding that 
case successfully parses the files that failed with the stock pdfbox-1.5.0. 
Below is a diff of my patched PDColorSpaceFactory that handles the case where 
the colorspace name is DeviceGray. Incidentally, it occurs to me that another 
(possibly better) approach is to call through to createColorSpace(String) when 
no other case matches.

% diff PDColorSpaceFactory.java.orig PDColorSpaceFactory.java
94a95,97
> else if ( type.getName().equals( PDDeviceGray.NAME) ) {
> retval = new PDDeviceGray();
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to