tuxji commented on a change in pull request #641:
URL: https://github.com/apache/daffodil/pull/641#discussion_r714255142
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -357,7 +357,14 @@ class DataProcessor private (
def save(output: DFDL.Output): Unit = {
- val oos = new ObjectOutputStream(new
GZIPOutputStream(Channels.newOutputStream(output)))
+ val os = Channels.newOutputStream(output)
+
+ // write a null-terminated ASCII string as a simple version identifier
+ val headerString = "DAFFODIL " + Misc.getDaffodilVersion + "\u0000"
+ os.write(headerString.getBytes("US-ASCII"))
Review comment:
I reviewed how we call `getBytes` in Daffodil in order to check for
inconsistencies and best practices. I noticed two things: 1) we call
`getBytes("ascii")` instead every other place where we want bytes from ASCII
characters; and 2) we call `getBytes` without a charset name too many times.
Java's platform default charset is specific to the user and OS. On many modern
Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's
often CP1250, while in Europe it's CP1252 or in China it's often simplified
Chinese (Big5 or a GB*). I'm agnostic whether we use "ascii", "US-ASCII", or
import java.nio.charset.StandardCharsets and use StandardCharsets.US_ASCII (I
see Daffodil typically uses all-lowercase strings most often to save space and
typing), but we probably should create a bug to replace all parameter-less
`getBytes` calls with `getBytes("utf-8")`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]