Hi all,

I've been working with the POI for text extraction and had a few
design questions.  And, yes, I am volunteering to fix these if they
are indeed problems.

1) ExtractorFactory uses the ExcelExtractor rather than the
EventBasedExcelExtractor, which causes it to OOM for very large
workbooks.  I was wondering why this was and if it would be reasonable
to change it.

2) Without an event-based extractor for OOXML workbooks, you can never
extract text from very large workbooks.  I implemented a hacky
workaround to read only the shared strings xml doc, but I was
wondering if there was a better way to do this or if there was any
interest in polishing this into something that could be part of POI.

3) QuickButCruddyTextExtractor doesn't extend POIOLE2TextExtractor,
and I was wondering if there was a reason why.

Thanks,

--Phil

-- 

Machines might be interesting, but people are fascinating. -- K.P.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to