this sounds like it might be the same platform-difference problem Forrest runs into and that affects the Derby web site:
http://db.apache.org/derby/papers/derby_web.html#odd_diffs FOR-492 references a workaround, but I haven't looked at it, don't know if it could apply to Derby. -jean Army wrote: > As part of my work for DERBY-1758 I'm looking at the XML binding test > (lang/xmlBinding.java in the old harness, lang/XMLBindingTest.java in > JUnit) and I noticed that the test, which counts characters as a simple > sanity check for insertion of docs larger than 32k, returns different > results on Linux vs Windows. (Actually, Bryan Pendleton was the first > one to notice this a while back when he was reviewing DERBY-688 changes). > > Long story short, Xalan serialization (which is what Derby uses to > serialize XML documents) inserts platform-specific line-endings (based > on the "line.separator" System property) into XML documents for every > newline. This appears to be technically valid, so it is not a bug per > se [1]. However, from a Derby perspective this means that someone who > inserts the exact same XML document into an XML column on Windows vs on > Linux will actually be inserting more characters in the former case than > in the latter (because the Windows line separator is two characters). > Or put differently, when inserting an XML document on Windows an extra > character is written to disk for every line in the XML document. This > does *not* happen with other character types (ex. CLOB). > > My question, then, is this: Is it considered a "bug" in Derby if > insertion of the same XML value by the user can lead to different data > (namely, line ending characters) being written to disk for different > platforms? > > There appear to be two obvious ways to get around this problem: 1) add > logic in Derby engine to take the result of Xalan serialization and > replace platform-specific line-endings with "\n", or 2) change the XML > binding test to always count line-endings as a single "character" for > the sake of asserting character counts. > > I'm leaning toward option 1, but am not particularly driven one way or > the other. If the answer to my above question is "Yes, it's a bug", > then option 1 is clearly the only option; otherwise option 2 makes the > test pass and is easy to implement. It does a feel a tad like cheating, > though... > > Comments/feedback are appreciated, if anyone has any. > > Thanks, > Army > > ---- > > [1] > > I searched Jira for this and found a couple of relevant Xalan issues, > especially XALANJ-2093 and XALANJ-1701. There is apparently a new > property introduced in Xalan 2.7 to allow the user to indicate what > should happen with newlines, but that property is non-standard and would > require Derby to use Xalan 2.7 in order to build. > > Based on comments in the aforementioned XALANJ issues it looks like it > is technically valid for Xalan to convert the newlines to > platform-specific endings. This seems to agree with the following quote > from the w3c page on serialization: > > http://www.w3.org/TR/xslt-xquery-serialization/#serdm: > > "When outputting a newline character in the instance of the data model, > the serializer is free to represent it using any character sequence that > will be normalized to a newline character by an XML parser, unless a > specific mapping for the newline character is provided in a character > map (see 9 Character Maps)." > > I don't know what Xalan serialization does with character maps, but > there is nothing explicit in Derby to specify use of such maps, so my > (admittedly lacking) understanding is that it's okay for Xalan to return > platform-specific line-endings when serializing. >
