[GitHub] [daffodil] tuxji commented on a change in pull request #641: Improve validation and diagnostics when reloading saved parsers

GitBox Wed, 22 Sep 2021 12:50:29 -0700


tuxji commented on a change in pull request #641:
URL: https://github.com/apache/daffodil/pull/641#discussion_r714255142




##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -357,7 +357,14 @@ class DataProcessor private (
 
   def save(output: DFDL.Output): Unit = {
 
-    val oos = new ObjectOutputStream(new 
GZIPOutputStream(Channels.newOutputStream(output)))
+    val os = Channels.newOutputStream(output)
+
+    // write a null-terminated ASCII string as a simple version identifier
+    val headerString = "DAFFODIL " + Misc.getDaffodilVersion + "\u0000"
+    os.write(headerString.getBytes("US-ASCII"))

Review comment:
       I reviewed how we call `getBytes` in Daffodil in order to check for 
inconsistencies and best practices.  I noticed two things: 1) we call 
`getBytes("ascii")` instead every other place where we want bytes from ASCII 
characters; and 2) we call `getBytes` without a charset name too many times.  
Java's platform default charset is specific to the user and OS.  On many modern 
Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's 
often CP1250, while in Europe it's CP1252 or in China it's often simplified 
Chinese (Big5 or a GB*).  I'm agnostic whether we use "ascii", "US-ASCII", or 
import java.nio.charset.StandardCharsets and use StandardCharsets.US_ASCII (I 
see Daffodil typically uses all-lowercase strings most often to save space and 
typing), but we probably should create a bug to replace all parameter-less 
`getBytes` calls with `getBytes("utf-8")`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] tuxji commented on a change in pull request #641: Improve validation and diagnostics when reloading saved parsers

Reply via email to