Hi all, I've been working with the POI for text extraction and had a few design questions. And, yes, I am volunteering to fix these if they are indeed problems.
1) ExtractorFactory uses the ExcelExtractor rather than the EventBasedExcelExtractor, which causes it to OOM for very large workbooks. I was wondering why this was and if it would be reasonable to change it. 2) Without an event-based extractor for OOXML workbooks, you can never extract text from very large workbooks. I implemented a hacky workaround to read only the shared strings xml doc, but I was wondering if there was a better way to do this or if there was any interest in polishing this into something that could be part of POI. 3) QuickButCruddyTextExtractor doesn't extend POIOLE2TextExtractor, and I was wondering if there was a reason why. Thanks, --Phil -- Machines might be interesting, but people are fascinating. -- K.P. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
