Hi -

I’ve come with the plan for my POI talk next weekend. I need to finalize my 
slides tomorrow so that some Chinese translation can be done. I have some 
questions that I’ll mark as “—>”. If you can answer you’ll save me some 
research.

I plan to tell the story of POI, including Tika interactions, and Common 
Crawler, in the end I want to give people two places to contribute along with 
motivation.

(1) Title
        Name of presentation
        About Dave
(2) POI
        When it started in Jakarta the simple use case.
        End of Jakarta
(3) OOXML and the Microsoft Open Specification Promise
        The OSP
        The flame war
        OpenXML4J - http://incubator.apache.org/ip-clearance/openxml4j.html 
<http://incubator.apache.org/ip-clearance/openxml4j.html>
        XSSF, XSLF, and SS
(4) Tika and OOXML lite
        Apachecon Oakland 2009 - Jukka asked Nick, Yegor and I during BarCamp 
if we could something about the 13MB ooxml jar. Yegor came up with a solution 
in a day. 
        Unit Test and your Beans are included
        —> Anyone: anything to add? XMLBeans impacts?
(5) Graphics2D
        Discuss output techniques developed.
        —> Yegor - is there some sample code you might share.
(6) Tika Text Extraction
        —> Could use pointers to the basic tutorial.
(7) Common Crawler - 1TB of samples
        Common Crawler - commoncrawl.org
        Common Crawler Download - centic9
        Regression sets for POI, Tika and PDFBox
        —> Are there other Apache projects that use these documents?
(8) The POI Toolbox
        A table of the various formats with input, output, and remarks.
(9) XMLBeans 3
        Bringing the product out of the attic.
        —> Any reasons besides better control of Entity Expansion attacks?
(10) Contributing to POI and Tika Will Improve Your Solr Search Results
        How Solr and similar architectures depend on Tika and Tika depends on 
POI
        Example is Headers and Footers choices on Word documents on the Tika 
List this past week.

Thanks for your help and feedback!

Regards,
Dave


Reply via email to