Thanks, will check it out.

On Fri, Jan 2, 2015 at 5:07 PM, Jukka Zitting <[email protected]>
wrote:

> Hi,
>
> 2015-01-02 16:37 GMT-05:00 Grant Ingersoll <[email protected]>:
> > I think the problem is that the file types in question are not
> discernible
> > by anything other than the actual content, with the big problem being
> this
> > is an expensive operation.
>
> Right, then approach 2 might work better, or Tyler's suggestion to
> just modify the existing parser.
>
> > I'll poke around here a bit and see if anything stands out.
>
> A related point is the way the POI container detector uses the
> TikaInputStream.get/setOpenContainer() mechanism [1] to pass the
> results of any early heavy lifting from type detection to the parsing
> phase [2].
>
> [1]
> https://tika.apache.org/1.6/api/org/apache/tika/io/TikaInputStream.html#getOpenContainer()
> [2]
> https://github.com/apache/tika/blob/1.6/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java#L385
>
> BR,
>
> Jukka Zitting
>

Reply via email to