Hi

When you are not in perfect word et you need to
works with 3rd users supplied data you have a risk to "meet" not
correct data and produce a lot of errors. To avoid a such of situation
you can use try / catch blocks.

Hm, there are always several, at least two, approaches. One end says, there is some sort of specification for the file format, so expect your data to be conform to that specification. The other dogma tries to come along with all possible mis-configured or non-conform data.

Personally, I prefer to support the first approach as a start. Supporting all "looks to some extent like a Word document" data is a lot of effort. This can be done, while POI evolves, but while there are Word features, which are lacking proper support in the POI API, I don't see the point in concentrating development efforts in order to handle non-conform data. In my opinion, the POI API should allow to read all Word-created files and write Word documents that can be used in MS Word, without Word complaining about the documents being corrupt. I do not see POI as a repair tool, supposed to patch up corrupt files. Nor as a rescue tool, expected to extract the most information from a corrupt DOC(X) file - and that is what I understood you are talking about.

Of course, provoking RuntimeExceptions ain't very good style and should not happen. On the other hand, if POI classes encounter non-standard data, the code must take a decision what to do next. Sometimes there are ways to handle some data flaws. But more often, a method will simply abort, throwing a POIXMLException and stop processing, as there is no way to make sure the further processing won't fail again and again. Thinkable data faults start with corrupted ZIPs, pass malformed XML and end up in wrong references inside the document itself. Of course we are aware of the try-catch-block usage. But what would you propose to do inside the catch block? With your task in mind - get the most textual data from the file - the implementation can just "grit its teeth" and pretent nothing happened. But if you will try to handle / modify the corrupt data furthermore, the result will get worse and worse.

Maybe you could provide some sample data, that causes the trouble you reported?

Kind regards,
Stefan Stern

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to