Thanks, will check it out. On Fri, Jan 2, 2015 at 5:07 PM, Jukka Zitting <[email protected]> wrote:
> Hi, > > 2015-01-02 16:37 GMT-05:00 Grant Ingersoll <[email protected]>: > > I think the problem is that the file types in question are not > discernible > > by anything other than the actual content, with the big problem being > this > > is an expensive operation. > > Right, then approach 2 might work better, or Tyler's suggestion to > just modify the existing parser. > > > I'll poke around here a bit and see if anything stands out. > > A related point is the way the POI container detector uses the > TikaInputStream.get/setOpenContainer() mechanism [1] to pass the > results of any early heavy lifting from type detection to the parsing > phase [2]. > > [1] > https://tika.apache.org/1.6/api/org/apache/tika/io/TikaInputStream.html#getOpenContainer() > [2] > https://github.com/apache/tika/blob/1.6/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java#L385 > > BR, > > Jukka Zitting >
