[
https://issues.apache.org/jira/browse/TIKA-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771889#comment-17771889
]
Alexey Pismenskiy edited comment on TIKA-4148 at 10/4/23 3:09 PM:
------------------------------------------------------------------
Yes, they are OLE2 compound files.
So, for mime type detection, I don't really know if we can narrow down by magic
bytes that they are autodesk inventor specific. Maybe use that "D0 CF 11 E0 A1
B1 1A E1" from above do detect OLE2, and then use file extensions?
I was also thinking we can leverage
[https://poi.apache.org/text-extraction.html,] one of the parsers:
{{File iptFile = new File("MyBoxCenterB.ipt");}}
{{try (POIFSFileSystem fs = new POIFSFileSystem(iptFile)) {}}
{{ DirectoryEntry root = fs.getRoot();}}
{{ // get entities metadata}}
{{ HPSFPropertiesExtractor extractor = new HPSFPropertiesExtractor(fs);}}
{{ // extract properties}}
}
maybe sufficient for metadata?
I think zip is just their way of packaging the project, zip extraction is
trivial, I'm looking specifically for embedded files (ipj, ipt, iam)
was (Author: JIRAUSER299016):
Yes, they are OLE2 compound files.
So, for mime type detection, I don't really know if we can narrow down by magic
bytes that they are autodesk inventor specific. Maybe use that "D0 CF 11 E0 A1
B1 1A E1" from above do detect OLE2, and then use file extensions?
I was also thinking we can leverage
[https://poi.apache.org/text-extraction.html,] one of the parsers:
{{File iptFile = new File("MyBoxCenterB.ipt");}}
{{try (POIFSFileSystem fs = new POIFSFileSystem(iptFile)) {}}
{{ DirectoryEntry root = fs.getRoot();}}
{{ // get entities metadata}}
{{ HPSFPropertiesExtractor extractor = new HPSFPropertiesExtractor(fs);}}
{{ // extract properties}}
{{} }}
maybe sufficient for metadata?
I think zip is just their way of packaging the project, ip extraction maybe a
trivial, I'm looking specifically for embedded files (ipj, ipt, iam)
> Support Autodesk Inventor files (.ipt) (.iam) (.ipn) (.idw)
> -----------------------------------------------------------
>
> Key: TIKA-4148
> URL: https://issues.apache.org/jira/browse/TIKA-4148
> Project: Tika
> Issue Type: Improvement
> Reporter: Alexey Pismenskiy
> Priority: Major
>
> Add support for Autodesk Inventor files in Tika.
> Examples of the files can be downloaded from
> [https://www.autodesk.com/support/technical/article/caas/tsarticles/ts/3gnm93P9sPAWE6vndk7fjq.html]
> It would be great to start at least at the metadata level and then add
> content parsing later.
> I suspect I would be something similar to
> [DWGParser|[https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html]|https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html].],
>
> any suggestions where to start looking are appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)