[
https://issues.apache.org/jira/browse/TIKA-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787608#comment-17787608
]
Nick Burch commented on TIKA-4148:
----------------------------------
For detection of the OLE2 based files, we don't need to find unique byte
combinations, we only need to find unique OLE2 entry names / sets of names
See
[https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/POIFSContainerDetector.java#L362]
for an example of "must have this then one of those"
If you can run POIFSLister (and/or POIFSDumper) on a bunch of files, and spot
the entry names that are common (+ ideally not already in POIFSContainerDector
for other ones), that's what we need
> Support Autodesk Inventor files (.ipt) (.iam) (.ipn) (.idw)
> -----------------------------------------------------------
>
> Key: TIKA-4148
> URL: https://issues.apache.org/jira/browse/TIKA-4148
> Project: Tika
> Issue Type: Improvement
> Reporter: Alexey Pismenskiy
> Priority: Major
>
> Add support for Autodesk Inventor files in Tika.
> Examples of the files can be downloaded from
> [https://www.autodesk.com/support/technical/article/caas/tsarticles/ts/3gnm93P9sPAWE6vndk7fjq.html]
> It would be great to start at least at the metadata level and then add
> content parsing later.
> I suspect I would be something similar to
> [DWGParser|[https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html]|https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html].],
>
> any suggestions where to start looking are appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)