Daniel Carrera wrote:



Even if you don't, the file format itself is less prone to damage.

I'm not trying to start an argument, Daniel, but I'm wondering what you're basing that statement on. AFAIK, a file is a file is a file. Bits flip, hard drives fail, crap happens. It's hard to see how one file type would be less vulnerable than any others.



Because XML is well structured and clearly defined, when damage /does/ occur, it is often possible to *guess* what the data should have been. That's part of the beauty of XML.


Example snipped.

You can still reconstruct the original XML. For that matter, so can OOo (to some degree).

Compare this with a binary data structure. It will, in some way or another, have the form of an n-ary tree (they all do). Suppose that a node gets deletted. Now you've lost everything below that node (possibly a few paragraphs). Or wose, it might make the file impossible to parse.

Now look at the XML again. Think of how many bytes you'd have to lose (and lose _sequentially_) for you to lose a "node".

Ain't it cool? :-)

Except for one little problem: the file isn't actually stored as XML. It's stored as a compressed ZIP.


The ability to reconstruct a file like you illustrated is dependent on the low information density of the xml file. IOW, if a few bytes go missing or get garbled, you can interpolate based on context. It takes several bytes to convey each "concept" and the language has superfluous characters in each "word". For example, the difference between "tough" and "tuf".

In zip files each repetitive byte sequence (e.g. a tag like <table: cell> ) gets replaced by a single-byte "stand-in". IINM, the process is recursive to a degree as well. This dramatically increases the information density. This would seem to me to make a zip every bit as vulnerable to damage as any binary file.


I've repaired damaged OOo files by hand (not many). One of them was a book by an Italian writer. It took him months to write, it was a few hundred pages. One day, as OOo was writing to the disk the power went and and the file got corrupted.

He sent me the file. I unzipped it and ran it through XML Tidy. Tidy complained about a mal-formed tag on row x column y. I went there, fixed the tag, and zipped it again. Voila, the file was fixed.

This took me 5min of work, and it saved months of work from this writer.

I think your friend was very lucky. Not only to have a knowledgable friend such as yourself who could help, but also that the file wasn't damaged more extensively or in a different manner.


I'm also very surprised the zip was usable at all if the incident occurred as you said. The file handle would have been open and there likely would have been bytes in the buffer stream that didn't get committed to disk. It's more likely that the power "blipped". We get that a lot here during thunderstorm season. The screen will flash off for a second, but there's enough juice in the capacitors to keep things running through it. Sometimes it reboots and sometimes it doesn't. I REALLY need to get a good UPS. ;)

In complex systems there are many potential points of failure and you rarely if ever get to pick and choose which point of failure is going to affect you. A usable zip file implies either very minor damage or damage in a non-critical location. A corrupted replacement table will fry the file totally.

If I understand how all this works, saving a file in OOo is like a pipeline in *nix.

(file construction process) | (zip process | (file write to disk)

The ability to reconstruct a bad file will depend a lot on where and how the damage occurs. In one regard, the OOo process may actually be *more* dangerous simply because a lot more processing has to happen on the way to the hard drive. It must, otherwise why does it take longer? Hard drives themselves are very reliable nowadays; the most likely points of failure are the transit points (e.g. file downloads and transfers).

Just my $0.02 worth. I would be interested in hearing your take on it, since I recognize that you probably know more about this than me.

Rod


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to