Word styles are now generated correctly when a style depends on other styles even if that style is of a higher index number,unlike my original mechanism which couldn't handle such complexity :-) so the "ISTD out of sequence" message is a thing of the past. This means that for the abiword people the word style code should be complete, the wvParseStruct contains a STSH which has all the styles in it, their names and their properties all correct (hopefully). And when receiving a PAP and CHP from the element handler the dirty flag lets you know if the paragraph conforms exactly to the style that it is based on. So that should be the end of that. I finished a test run last night on 4747 document (556Megs) that have accumulated through the online conversion site, and there were no crashes, this includes stacks of word97,95,6 and 9(!!) documents. So I am now particularly interested in files that can crash wv. There are certainly showstopper bugs in there that havn't been unearthed yet. (This of course was before todays stylesheet change, which might be buggy) I was looking into word 5 and 2 support as well recently, and I made a few small mods that will allow for this in the near future. Non western european word 95 and 6 documents are not stored in unicode, but probably use the windows codepages and some identifiers to specify which is which. I havn't figured out yet what the full story is with them, so the conversion of these documents might be quite wrong in terms of the returned content. Consider this a known bug for the moment. The new version will often do a lot of complaining about "invalid lists" and "character or paragraph runs not being open". These are two workarounds which I believe are fully correct, but I'm just leaving the warnings in there for now to highlight that this is a possible danger spot. The next task is to cleanly implement graphic extraction, the current generation of wvHtml puts a picture placeholder where each graphic should be, it also attempt to set the correct size as well. All graphics are embedded in another fileformat known as "escher", so I need to rewrite my escher parser. Even after the graphics are extracted in usable form some of them are wmf and emf files, I have written previously a library to convert these into gif, but I need to rewrite it to use the new gd library which only supports png. So there some work to be done on that front as well. The abiword people should be aware that adding graphic support to libwv will add a dependancy on libz as the wmf files are stored compressed inside word. I am open to suggestions for any required graphic extraction api. The first couple of attempts will just hand off a FILE * of a temporary file, so that I can examine each stage seperately. This version is in abiword's cvs and at http://www.csn.ul.ie/~caolan/publink/mswordview/development/ Version number is 0.5.39 I have a sneak preview of my new wv site and online converter at http://www.csn.ul.ie/~caolan/wvWare/ Give the new online converter a whirl if you're interested in wvHtml, but don't want to go to the hassle of compiling it (or if the damn thing doesn't compile for you, you can see what you're missing. So upload any files that crash on you with wv, and you can submit bug reports as well, maybe I should use bugzilla, but that seems like a bit too much work :-). C. Real Life: Caolan McNamara * Doing: MSc in HCI Work: [EMAIL PROTECTED] * Phone: +353-86-8790257 URL: http://www.csn.ul.ie/~caolan * Sig: an oblique strategy Listen to the quiet voice
