[
https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352634#comment-17352634
]
Tilman Hausherr commented on TIKA-2689:
---------------------------------------
Here's some code that shows when there's an AI file (surprisingly, I found many
.pdf files that qualify, this should be checked by somebody who has an AI
installation):
{code}
void checkIllustratorDoc(final PDDocument doc, String name) throws IOException
{
PDPage page = doc.getPage(0);
COSDictionary pieceInfoDict =
page.getCOSObject().getCOSDictionary(COSName.PIECE_INFO);
if (pieceInfoDict == null)
{
return;
}
COSDictionary illustratorDict =
pieceInfoDict.getCOSDictionary(COSName.ILLUSTRATOR);
if (illustratorDict == null)
{
return;
}
COSDictionary privateDict =
illustratorDict.getCOSDictionary(COSName.PRIVATE);
if (privateDict == null)
{
return;
}
COSStream aiMetaData = privateDict.getCOSStream(COSName.AI_META_DATA);
System.out.println("yes: " + name);
if (aiMetaData == null)
{
return;
}
try (BufferedReader bfr = new BufferedReader(new
InputStreamReader(aiMetaData.createInputStream())))
{
String line1 = bfr.readLine();
String line2 = bfr.readLine();
System.out.println(line1.trim() + " " + line2.trim());
}
}
{code}
I'm adding the constants to PDFBox, until then, use
{{COSName.getPDFName("...")}}. The exact spelling is shown in the screenshot.
> *.ai type (Adobe illustrator ) files are not detected correctly.
> ----------------------------------------------------------------
>
> Key: TIKA-2689
> URL: https://issues.apache.org/jira/browse/TIKA-2689
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 1.16, 1.17, 1.18
> Reporter: Amit Pandey
> Priority: Major
> Attachments: example.ai, screenshot-1.png
>
>
> There is in-consistency in detecting **ai* types files when using different
> overloaded detect method. When I am using _detect(String filename)_, it gives
> correct file type - "*application/illustrator*". If I use _detect(InputStream
> is, String filename)_ or _detect(File fileObj)_ - it gives file type
> "*application/pdf*".
> Here is sample code I used.
>
> [https://stackoverflow.com/questions/51359351/tika-detect-method-not-giving-same-exact-file-type|http://example.com/]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)