Re: Writing platform-specific line-endings to disk...

Army Tue, 21 Nov 2006 20:54:08 -0800

Daniel John Debrunner wrote:

I was thinking more generally in that an XML value may be generated andthus never have been stored to disk. How it is stored on disk and howthe XML value is serialized using XMLSERIALIZE() are differentoperations, it's just an implementation detail of derby that they arethe same in some instances.


Okay, that makes sense.  Sorry for not grasping this earlier.

Would all these operations return the same exact characters to anapplication if they represent the same logical value?
XMLSERIALIZE(colvalue originally on linux)
XMLSERIALIZE(colvalue originally on windows)
XMLSERIALIZE(generated XML value from other XML operators)


I'm assuming the following definitions for this question:

  - let "colvalue" represent the logical value
  - let "colvalue originally on linux" be the result of inserting
    <colvalue> on a Linux machine
  - let "colvalue originally on windows" be the result of inserting
    <colvalue> on a Windows machine
  - let "n" be the number of characters (including line breaks) in
    <colvalue>.
  - let <nl> be the number of line breaks in <colvalue>

If this is correct, then the answer to the question is No, the above threeoperations would not return the same exact characters. The result of the firstoperation will have (n) characters in it. The result of the second operationwill have one more character ("\r") in it for every line break in "colvalue";i.e. it will have (n + nl) characters in it. And the result of the thirdoperation will have (n + nl) characters if executed on Windows, but only (n)characters if executed on Linux.

Note that once inserted, serialization of a specific row will return the samecharacters regardless of whether the XMLSERIALIZE is executed on Windows orLinux. Or put another way, the result of the first operation will always return(n) characters, regardless of platform. Similarly, the result of the secondoperation will always return (n + nl) characters.

Would it surprise an application to receive different character valuesfor those expressions?

Good question. I did some searching around on the Xalan/Xerces Jira issues andthe general notion seems to be that XML "output" (which I presume includes theresult of XML serialization) can convert the newline character to theplatform-specific newline. See esp. Joe Kesselman's comments on XALANJ-1137.This leads me to believe that there is truth to what Bryan Pendleton said in hisreply to the question, namely:


 - carefully written XML applications should not be affected by this

If the expectation (as apparently backed by the XML spec) is that "output" canhave platform-specific newlines, then it seems like an application written toprocess XML data should not be surprised by this behavior. And that in a wayleads to the next question:

If they are different, does it matter since they are all validserializations under SQL/XML?

Presumably no, it does not (or at least, should not) matter. But having saidthat, I cannot help but nod in agreement when I read the following:

My gut feeling is that different character values would be confusing toan application, but it probably depends what the application is doingwith them. Looking at them in notepad would be confusing. :-)

Given that the relevant specs seem to indicate that it is valid to returnplatform-specific endings and it is *also* valid to just return "\n", and giventhat the latter option strikes me as potentially less confusing to the app, Itend to the lean toward the less confusing option. Of course, a lot of that hasto do with the fact that the latter option is pretty easily implemented in thecode. I made the following addition to the end of the "serializeToString()"method in SqlXmlUtil.java and was able to get consistent results (i.e. exactlythe same characters) across platforms:


+        String eol = PropertyUtil.getSystemProperty("line.separator");
+        if (eol != null)
+            return sWriter.toString().replaceAll(eol, "\n");
         return sWriter.toString();

Downside is a potential performance hit for large XML docs, which may not beworth it. Note, though, that the implementation as a whole is not very idealfor large XML documents because it (already) materializes the entire documentinto memory. This continues to be a fish for any idle cooks to fry...

Thinking a little more, having XMLSERIALIZE() (within an given runtime)being non-deterministic seems wrong.

When you write "within a given runtime", what is the definition of "runtime"?Is that a specific JVM instance on a specific machine, or is it "Derby" on amore general level? Or something else entirely? Is the behavior that Idescribed above (i.e. different characters depending on which platformoriginally inserted <colvalue>) considered non-deterministic?

I find myself agreeing with both Dan and Bryan on this, and for that reason Itend to believe the following:


(to quote Bryan):

  - it's not a bug in Derby that the serialization can differ in
    details like this
  - carefully written XML applications should not be affected by this
  - it is reasonable to adjust the test to avoid hitting this problem.

(and as an additional thought):

  - Given that there is at least one potentially simple "enhancement" to
    Derby that could resolve the issue within the engine instead of the
    within the test, it is *also* reasonable--and perhaps preferable--to
    make that change in the engine so that we can (hopefully) reduce the
    likelihood of confusing applications that use XML data in Derby.
    This would also ensure deterministic (so far as I understand it)
    behavior of XMLSERIALIZE across platforms.

Any additional thoughts/suggestions/corrections?

Thanks to Dan, Jean, and Bryan for taking the time to reply thus far...

Army

Re: Writing platform-specific line-endings to disk...

Reply via email to