Hi -
I’ve come with the plan for my POI talk next weekend. I need to finalize my
slides tomorrow so that some Chinese translation can be done. I have some
questions that I’ll mark as “—>”. If you can answer you’ll save me some
research.
I plan to tell the story of POI, including Tika interactions, and Common
Crawler, in the end I want to give people two places to contribute along with
motivation.
(1) Title
Name of presentation
About Dave
(2) POI
When it started in Jakarta the simple use case.
End of Jakarta
(3) OOXML and the Microsoft Open Specification Promise
The OSP
The flame war
OpenXML4J - http://incubator.apache.org/ip-clearance/openxml4j.html
<http://incubator.apache.org/ip-clearance/openxml4j.html>
XSSF, XSLF, and SS
(4) Tika and OOXML lite
Apachecon Oakland 2009 - Jukka asked Nick, Yegor and I during BarCamp
if we could something about the 13MB ooxml jar. Yegor came up with a solution
in a day.
Unit Test and your Beans are included
—> Anyone: anything to add? XMLBeans impacts?
(5) Graphics2D
Discuss output techniques developed.
—> Yegor - is there some sample code you might share.
(6) Tika Text Extraction
—> Could use pointers to the basic tutorial.
(7) Common Crawler - 1TB of samples
Common Crawler - commoncrawl.org
Common Crawler Download - centic9
Regression sets for POI, Tika and PDFBox
—> Are there other Apache projects that use these documents?
(8) The POI Toolbox
A table of the various formats with input, output, and remarks.
(9) XMLBeans 3
Bringing the product out of the attic.
—> Any reasons besides better control of Entity Expansion attacks?
(10) Contributing to POI and Tika Will Improve Your Solr Search Results
How Solr and similar architectures depend on Tika and Tika depends on
POI
Example is Headers and Footers choices on Word documents on the Tika
List this past week.
Thanks for your help and feedback!
Regards,
Dave