Nick, See below.
On Mar 31, 2010, at 7:14 AM, [email protected] wrote: > https://issues.apache.org/bugzilla/show_bug.cgi?id=49020 > > --- Comment #3 from Nick Burch <[email protected]> 2010-03-31 11:14:53 > UTC --- > The bug is really with Excel here - it has generated a file with invalid XML. > The xlsx file is defined as being made up of XML subparts, and the XML spec is > very very strict on matching tags. > > For the long term, you should report a bug to Microsoft about this. They > either > need to sanitise the user input and sort out the tags (eg <br> becomes <br > />), > or they need to give up and escape the whole tag contents for the bits where > iffy data could get added (eg put this textbox within a CDATA section) > I will report the but to Microsoft, but that does not address existing files. > Short term, you could just comment out the code that reads in the vmlDrawing > section of the file, and ensure that you don't touch the drawing records Please expand on "just comment out the codes that reads the vmlDrawing section". Since my application supports many version of Excel, I use WorkbookFactory.create() to read the file. > > Medium term, we should get a list of the problem bits that Excel does wrong, > such as <br> (but perhaps others). Then, we need to write a XML Input Wrapper > that cleans these up before they get passed to the XML Processor for loading. > Something like this is quite nasty, though it's possible some other project > out > there has already done it, and we can just re-use what they do. > I like this as a solution. > Paul Spencer --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
