Tika's IOUtils appears to be missing the readFully method. Should that
be added?
- Bob
On 3/27/2016 6:52 PM, Nick Burch wrote:
On Sun, 27 Mar 2016, Bob Paulin wrote:
Currently the Apache POI dependency is in several modules and it's
sort of a beast (> 2 MB in size).
You should've seen it before Jukka and Yegor spent a crazy ApacheCon
hacking up the ooxml-lite support... ;-)
It appears many of the modules are only using the IOUtils library.
I suspect a strong overlap with the parser classes I've helped write...
Any concerns with replacing this POI stuff with commons-io? Does POI
offer anything above the commons-io functionality in IOUtils? If not
I think it would be great to isolate the poi dependency to the office
module only.
A lot of the use is for endian-specific reading of numbers and
strings. Might be a bit of stream stuff, but mostly that can be passed
off to the Tika IO utils classes.
From a quick check, I can't see any endian number stuff in commons
IO, but
I might of missed it, or it might be in a different commons module. If
not, there might be something to be said for popping that POI logic
along with some of the Ogg-Vorbis utils stuff (another one with my
grubby mits all over it) into a more helpful general utils grouping
Nick