Cool! How about creating this as a separate subproject in svn? (poi.apache.org/ooxml)
Just a thought. Regards - Avik On Sunday 30 December 2007 10:43:06 Nick Burch wrote: > If you've been watching the commit messages over the last few days, you'll > have seen that I've made a quick stab at some ooxml support. > > The code I've committed is powered by two other projects: > * xml beans - http://xmlbeans.apache.org/ > * openxml4j - http://www.openxml4j.org/ > > OpenXML4J provides a nice library to get at the underlying zip file > format, grab the relationships between different bits of the file etc. In > many ways, it's the ooxml equivalent of poifs. > > Then, I'm using xmlbeans + the microsoft supplied xsds to build up the > low level objects to work with the different streams. These objects are > much like our record and record factory stuff. > > > On top of all this, I've written some classes to handle getting at the > interesting low level parts of the files (HSSFXML, HWPFXML and HSLFXML). > I've stubbed out some usermodel equivalents, but not done anything else > with them. Finally, I've written some text extractors, which use the low > level beans to get the text out into a format you can stuff into lucene. > > > Couple of snags to be aware of: > * openxml4j is java 1.5, so we're going to want to keep this all separate > whatever happens, so people who don't want ooxml can continue to use > poi with java 1.3 / 1.4 > * openxml4j haven't done even an alpha release yet, so we're working of a > jar I tested and built, hosted off people.apache.org/~nick/ > * the ooxml xsds haven't been confirmed to be under a ASL compatible > licence, so ant just downloads everyone their own copy of them, and > they're not in svn > * everything ooxml related has its own ant tasks - compile-ooxml, > test-ooxml and jar-ooxml. The existing ant tasks (eg compile, jar, > dist) will all ignore the ooxml stuff, so you'll have to take positive > steps to get it > * to confirm, if you do a dist or a jar, you won't get it, so it won't > interfere with the 3.0.2 release process > * there's no formal documentation for it, just unit tests and javadocs. > I'm holding off writing any until other people have sanity checked the > api structure :) > * you're going to need to read the emca specs if you want to make much use > of it as it stands, unless you know the ole2 equivalents really well > and can spot how they've stuffed it all into xml... > > > Next up is probably write support. This may require some tweaking and > thought, as there are three objects relating to each stream: > * the PackagePart (xml file in the zip) > * the Document (bean for the root of the xml file) eg WorkbookDocument > * the main bean, eg CTWorkbook > As someone using the API, you'll want the CTWorkbook, as that's the thing > with the actual data on it. However, to save the changes, you need to get > the document bean the ct bean came from, and trigger the write from there, > and stuff the resulting bytes back into the PackagePart. So, we'll need to > track all these bits internally, so we can give the user the bean they > want, but still have everything available to write it out. > > (One option might be to nobble xmlbeans so that we can attach the > PackagePart onto the Document, and get back at the document from the main > bean, but that might prove to be far too much work, so we'll have to see) > > If anyone has any good ideas for how to do the writing stuff, do pipe up. > It looks like it's going to be a little while before I get a copy of > office 2007, so there's no point me trying to knock up write support > before I have something to test opening with, which gives us a gap to > figure it out in :) > > > Oh, and I've put all the code in src/scratchpad/ooxml-src/ and > src/scratchpad/ooxml-testcases/, to indicate it's of scratchpad completion > levels, but different directories as it needs java 1.5. Once it's a bit > more stable, we'll probably want to move it to its own top level area > under src > > Nick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
