[ 
https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352634#comment-17352634
 ] 

Tilman Hausherr commented on TIKA-2689:
---------------------------------------

Here's some code that shows when there's an AI file (surprisingly, I found many 
.pdf files that qualify, this should be checked by somebody who has an AI 
installation):
{code}
void checkIllustratorDoc(final PDDocument doc, String name) throws IOException
{
        PDPage page = doc.getPage(0);
        COSDictionary pieceInfoDict = 
page.getCOSObject().getCOSDictionary(COSName.PIECE_INFO);
        if (pieceInfoDict == null)
        {
                return;
        }

        COSDictionary illustratorDict = 
pieceInfoDict.getCOSDictionary(COSName.ILLUSTRATOR);
        if (illustratorDict == null)
        {
                return;
        }

        COSDictionary privateDict = 
illustratorDict.getCOSDictionary(COSName.PRIVATE);
        if (privateDict == null)
        {
                return;
        }

        COSStream aiMetaData = privateDict.getCOSStream(COSName.AI_META_DATA);
        System.out.println("yes: " + name);
        if (aiMetaData == null)
        {
                return;
        }

        try (BufferedReader bfr = new BufferedReader(new 
InputStreamReader(aiMetaData.createInputStream())))
        {
                String line1 = bfr.readLine();
                String line2 = bfr.readLine();
                System.out.println(line1.trim() + " " + line2.trim());
        }
}
{code}
I'm adding the constants to PDFBox, until then, use 
{{COSName.getPDFName("...")}}. The exact spelling is shown in the screenshot.


> *.ai type (Adobe illustrator ) files are not detected correctly.
> ----------------------------------------------------------------
>
>                 Key: TIKA-2689
>                 URL: https://issues.apache.org/jira/browse/TIKA-2689
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.16, 1.17, 1.18
>            Reporter: Amit Pandey
>            Priority: Major
>         Attachments: example.ai, screenshot-1.png
>
>
> There is in-consistency in detecting **ai* types files when using different 
> overloaded detect method. When I am using _detect(String filename)_, it gives 
> correct file type - "*application/illustrator*". If I use _detect(InputStream 
> is, String filename)_ or _detect(File fileObj)_ -  it gives file type 
> "*application/pdf*".
> Here is sample code I used.
>   
> [https://stackoverflow.com/questions/51359351/tika-detect-method-not-giving-same-exact-file-type|http://example.com/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to