This is tremendous. Great work nick. Nick Burch wrote:
If you've been watching the commit messages over the last few days, you'll have seen that I've made a quick stab at some ooxml support.The code I've committed is powered by two other projects: * xml beans - http://xmlbeans.apache.org/ * openxml4j - http://www.openxml4j.org/ OpenXML4J provides a nice library to get at the underlying zip file format, grab the relationships between different bits of the file etc. In many ways, it's the ooxml equivalent of poifs. Then, I'm using xmlbeans + the microsoft supplied xsds to build up the low level objects to work with the different streams. These objects are much like our record and record factory stuff. On top of all this, I've written some classes to handle getting at the interesting low level parts of the files (HSSFXML, HWPFXML and HSLFXML). I've stubbed out some usermodel equivalents, but not done anything else with them. Finally, I've written some text extractors, which use the low level beans to get the text out into a format you can stuff into lucene. Couple of snags to be aware of: * openxml4j is java 1.5, so we're going to want to keep this all separate whatever happens, so people who don't want ooxml can continue to use poi with java 1.3 / 1.4 * openxml4j haven't done even an alpha release yet, so we're working of a jar I tested and built, hosted off people.apache.org/~nick/ * the ooxml xsds haven't been confirmed to be under a ASL compatible licence, so ant just downloads everyone their own copy of them, and they're not in svn * everything ooxml related has its own ant tasks - compile-ooxml, test-ooxml and jar-ooxml. The existing ant tasks (eg compile, jar, dist) will all ignore the ooxml stuff, so you'll have to take positive steps to get it * to confirm, if you do a dist or a jar, you won't get it, so it won't interfere with the 3.0.2 release process * there's no formal documentation for it, just unit tests and javadocs. I'm holding off writing any until other people have sanity checked the api structure :) * you're going to need to read the emca specs if you want to make much use of it as it stands, unless you know the ole2 equivalents really well and can spot how they've stuffed it all into xml... Next up is probably write support. This may require some tweaking and thought, as there are three objects relating to each stream: * the PackagePart (xml file in the zip) * the Document (bean for the root of the xml file) eg WorkbookDocument * the main bean, eg CTWorkbook As someone using the API, you'll want the CTWorkbook, as that's the thing with the actual data on it. However, to save the changes, you need to get the document bean the ct bean came from, and trigger the write from there, and stuff the resulting bytes back into the PackagePart. So, we'll need to track all these bits internally, so we can give the user the bean they want, but still have everything available to write it out. (One option might be to nobble xmlbeans so that we can attach the PackagePart onto the Document, and get back at the document from the main bean, but that might prove to be far too much work, so we'll have to see) If anyone has any good ideas for how to do the writing stuff, do pipe up. It looks like it's going to be a little while before I get a copy of office 2007, so there's no point me trying to knock up write support before I have something to test opening with, which gives us a gap to figure it out in :) Oh, and I've put all the code in src/scratchpad/ooxml-src/ and src/scratchpad/ooxml-testcases/, to indicate it's of scratchpad completion levels, but different directories as it needs java 1.5. Once it's a bit more stable, we'll probably want to move it to its own top level area under src Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Buni Meldware Communication Suite http://buni.org Multi-platform and extensible Email, Calendaring (including freebusy), Rich Webmail, Web-calendaring, ease of installation/administration.
smime.p7s
Description: S/MIME Cryptographic Signature
