All,
Over on Tika, we're switching to sax based parsing for docx and pptx
for 4.x/main. There's still a lot of work to do, but we hit parity and
even surpassed our dom based parsers over the last few weeks.
I found that with a few shims, I can remove our dependency on
poi-ooxml-lite. I'd like to make a few small changes to POI to make
our customizations a bit easier. I'll open up PRs and wait for
approval to make sure I'm not overstepping.
Longer term, it might be useful to have a poi-opc module that offers
just the basics of opc (basically openxmlrj) without ooxml-lite or
beans. And then, longer, longer term, perhaps contribute some very
cleaned up streaming read code to POI.
WDYT?
Best,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]