stevedlawrence commented on code in PR #1674:
URL: https://github.com/apache/daffodil/pull/1674#discussion_r3320223951


##########
daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/ConfigurableGZIPOutputStream.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.layers.runtime1;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.zip.GZIPOutputStream;
+
+/**
+ * A {@link GZIPOutputStream} subclass that allows the caller to specify the
+ * compression level explicitly. The standard {@code GZIPOutputStream} uses
+ * the JDK's default level, which can produce byte-different output across
+ * JDKs linked against different zlib implementations (e.g., stock zlib vs.
+ * zlib-ng). Specifying a higher compression level (such as
+ * {@link java.util.zip.Deflater#BEST_COMPRESSION}) yields consistent output
+ * across implementations.
+ *
+ * <p>According to the zlib documentation and the logic in {@code deflate.c},
+ * compression levels range from 0 to 9:
+ * <ul>
+ *   <li>Level 0: no compression (fastest)</li>
+ *   <li>Level 1: minimal compression, fastest of the compressing levels</li>
+ *   <li>Level 9: best compression, slowest</li>
+ * </ul>
+ *
+ * <p>As the compression level increases, speed decreases. According to one
+ * benchmark from
+ * <a 
href="https://devtoys.pro/sk/blog/gzip-compression-levels";>devtoys.pro</a>,

Review Comment:
   I generally hesitate to link to websites we don't control, especially when 
they appear in Javadoc/Scaladoc that could up on our website. I would suggest 
we say something like "some reports indicate that level 9 ..." and people can 
google sources or run their own tests if they are interested in actual details.



##########
daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/GZipLayer.java:
##########
@@ -55,6 +56,33 @@ public GZipLayer() {
     setProcessingErrorException(IOException.class);
   }
 
+  /**
+   * Provides the layer's parameter variables to the layer.
+   *
+   * <p>The compression level affects byte-level consistency of gzip output
+   * across JVMs linked against different zlib implementations. At
+   * {@link Deflater#DEFAULT_COMPRESSION} (which zlib resolves to level 6),
+   * stock zlib and zlib-ng may produce byte-different output for the same
+   * input due to differences in their match-finding heuristics. At higher
+   * levels (especially level 9), the two implementations tend to converge
+   * on identical output, at the cost of encoding speed.
+   *
+   * @param compressionLevel an integer specifying the gzip compression level
+   *                         to use when unparsing. Valid values are 0 through
+   *                         9, where 0 means no compression and 9 means 
maximum
+   *                         compression, or {@link 
Deflater#DEFAULT_COMPRESSION}
+   *                         ({@code -1}) to use the underlying zlib library's
+   *                         default.
+   */
+  public void setLayerVariableParameters(Integer compressionLevel) {

Review Comment:
   I think we usually use non-boxed types where possible. I think `int` should 
work here.



##########
daffodil-test/src/test/resources/org/apache/daffodil/layers/exampleGzipLayer.dfdl.xsd:
##########
@@ -59,6 +59,11 @@
               <element name="contents">
                 <complexType>
                   <sequence dfdlx:layer="gz:gzip">
+                    <annotation>
+                      <appinfo source="http://www.ogf.org/dfdl/";>
+                        <dfdl:newVariableInstance ref="gz:compressionLevel" 
defaultValue="9"/>

Review Comment:
   Can you add a comment to these examples to make it clear that they use the 
much slower level 9 compression to improve the likelihood of deterministic 
compression results when testing with different JVMs and systems. Users of the 
gzip layer should either exclude the newVariableInstance to use the default 
level or choose a level appropriate for their needs. 
   
   That way if anyone ever uses this an an example reference they hopefully 
won't stick with level 9, which almost no one using Daffodil probably really 
wants.



##########
daffodil-core/src/main/resources/org/apache/daffodil/layers/xsd/gzipLayer.dfdl.xsd:
##########
@@ -26,8 +26,8 @@
   <annotation>
     <appinfo source="http://www.ogf.org/dfdl/";>
 
-      <!-- the gzip layer has no parameter variables
-      nor return result variables. -->
+      <!-- gzip layer variable-->
+      <dfdl:defineVariable name="compressionLevel" type="xs:int" 
defaultValue="-1"/>

Review Comment:
   Can you also update documentation on the website to describe this variable? 
Here's the file and location:
   
   https://github.com/apache/daffodil-site/blob/main/site/layers.md#gzip-layer



##########
daffodil-core/src/main/resources/org/apache/daffodil/layers/xsd/gzipLayer.dfdl.xsd:
##########
@@ -26,8 +26,8 @@
   <annotation>
     <appinfo source="http://www.ogf.org/dfdl/";>
 
-      <!-- the gzip layer has no parameter variables
-      nor return result variables. -->
+      <!-- gzip layer variable-->
+      <dfdl:defineVariable name="compressionLevel" type="xs:int" 
defaultValue="-1"/>

Review Comment:
   Should we add `external="true"` to this? That way users can change the 
compression level without having to change the schema?



##########
daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/GZipLayer.java:
##########
@@ -55,6 +56,33 @@ public GZipLayer() {
     setProcessingErrorException(IOException.class);
   }
 
+  /**
+   * Provides the layer's parameter variables to the layer.
+   *
+   * <p>The compression level affects byte-level consistency of gzip output
+   * across JVMs linked against different zlib implementations. At
+   * {@link Deflater#DEFAULT_COMPRESSION} (which zlib resolves to level 6),
+   * stock zlib and zlib-ng may produce byte-different output for the same
+   * input due to differences in their match-finding heuristics. At higher
+   * levels (especially level 9), the two implementations tend to converge
+   * on identical output, at the cost of encoding speed.
+   *
+   * @param compressionLevel an integer specifying the gzip compression level
+   *                         to use when unparsing. Valid values are 0 through
+   *                         9, where 0 means no compression and 9 means 
maximum
+   *                         compression, or {@link 
Deflater#DEFAULT_COMPRESSION}
+   *                         ({@code -1}) to use the underlying zlib library's
+   *                         default.
+   */
+  public void setLayerVariableParameters(Integer compressionLevel) {
+    boolean isValidRange = (compressionLevel >= 0 && compressionLevel <= 9)
+            || compressionLevel == Deflater.DEFAULT_COMPRESSION;
+    if (!isValidRange) 
+      processingError("Invalid compression level: " + compressionLevel);

Review Comment:
   I think this wants to be a runtimeSchemaDefinitionError. The documentation 
example seems to have almost been written with this case in mind:
   
   
https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/java/org/apache/daffodil/api/layers/Layer.java#L357-L373



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to