Re: [PR] Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions [daffodil]

via GitHub Sat, 30 Mar 2024 06:29:05 -0700


stevedlawrence commented on code in PR #1199:
URL: https://github.com/apache/daffodil/pull/1199#discussion_r1545271846



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/ComparisonOps.scala:
##########
@@ -17,10 +17,42 @@
 
 package org.apache.daffodil.runtime1.dpath
 
+import org.apache.daffodil.lib.exceptions.Assert
 import org.apache.daffodil.lib.util.Numbers._
+import org.apache.daffodil.runtime1.dpath.NodeInfo.PrimType
 import org.apache.daffodil.runtime1.infoset.DataValue.DataValueBool
 import org.apache.daffodil.runtime1.infoset.DataValue.DataValuePrimitive
 
+object ComparisonOps {
+  lazy val forType: Map[

Review Comment:
   Can you add scaladoc for this, especially making sure to mention the order 
of the returned 6-tuple so we don't have to read the code to figure out which 
operation is which? It's not too hard to figure it out, but it would make 
things just a little easier. Maybe also include an example of its use with the 
`val (...) = ...` to easily extract the comparison ops?



##########
daffodil-test/src/test/resources/org/apache/daffodil/section23/dfdl_expressions/expressions.tdml:
##########
@@ -7684,4 +7684,230 @@ blastoff
     
<tdml:infoset><tdml:dfdlInfoset><add01>4</add01></tdml:dfdlInfoset></tdml:infoset>
   </tdml:parserTestCase>
 
+  <tdml:defineSchema name="DFDLCheckRange">
+    <xs:include 
schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+    <dfdl:format ref="ex:GeneralFormat" />
+    <xs:element name="literalMinMax_int">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:element name="data" dfdl:lengthKind="explicit" dfdl:length="1" 
type="xs:integer"/>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, 0, 8) }"/>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, 0, 8) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+    <xs:element name="expMinMax_double">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:double"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:double"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:double"/>
+          </xs:sequence>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+    <xs:element name="expMinMax_float">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:float"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:float"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:float"/>
+          </xs:sequence>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+    <xs:element name="expMinMax_decimal">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:decimal"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:decimal"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:decimal"/>
+          </xs:sequence>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+    <xs:element name="expMinMax_mixedIntAndFloat">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:integer"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:float"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:float"/>
+          </xs:sequence>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+    
+    <xs:element name="expMinMax_string_inc">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:string"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:string"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:string"/>
+          </xs:sequence>
+          <xs:element name="isInRangeInc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeInclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+    <xs:element name="expMinMax_string_exc">
+      <xs:complexType>
+        <xs:sequence>
+          <xs:sequence dfdl:separatorPosition="infix" dfdl:separator="|">
+            <xs:element name="data" dfdl:lengthKind="delimited" 
type="xs:string"/>
+            <xs:element name="min" dfdl:lengthKind="delimited" 
type="xs:string"/>
+            <xs:element name="max" dfdl:lengthKind="delimited" 
type="xs:string"/>
+          </xs:sequence>
+          <xs:element name="isInRangeExc" type="xs:boolean"
+                      dfdl:inputValueCalc="{ 
dfdl:checkRangeExclusive(../ex:data, ../ex:min, ../ex:max) }"/>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+
+  </tdml:defineSchema>
+
+  <tdml:parserTestCase name="DFDLCheckRange_01" validation="on"
+                       root="literalMinMax_int" model="DFDLCheckRange">
+    <tdml:document>8</tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <literalMinMax_int>
+          <data>8</data>
+          <isInRangeInc>true</isInRangeInc>
+          <isInRangeExc>false</isInRangeExc>
+        </literalMinMax_int>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="DFDLCheckRange_02" validation="on"
+                       root="expMinMax_double" model="DFDLCheckRange">
+    <tdml:document>2.3|2.3|2.5</tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <expMinMax_double>
+          <data>2.3</data>
+          <min>2.3</min>
+          <max>2.5</max>
+          <isInRangeInc>true</isInRangeInc>
+          <isInRangeExc>false</isInRangeExc>
+        </expMinMax_double>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="DFDLCheckRange_03" validation="on"
+                       root="expMinMax_float" model="DFDLCheckRange">
+    <tdml:document>2.5|2.3|2.5</tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <expMinMax_float>
+          <data>2.5</data>
+          <min>2.3</min>
+          <max>2.5</max>
+          <isInRangeInc>true</isInRangeInc>
+          <isInRangeExc>false</isInRangeExc>
+        </expMinMax_float>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="DFDLCheckRange_04" validation="on"
+                       root="expMinMax_decimal" model="DFDLCheckRange">
+    <tdml:document>2.5|2.3|2.5</tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <expMinMax_decimal>
+          <data>2.5</data>
+          <min>2.3</min>
+          <max>2.5</max>
+          <isInRangeInc>true</isInRangeInc>
+          <isInRangeExc>false</isInRangeExc>
+        </expMinMax_decimal>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="DFDLCheckRange_05" validation="on"
+                       root="expMinMax_mixedIntAndFloat" 
model="DFDLCheckRange">
+    <tdml:document>1|1.9|2.5</tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <expMinMax_mixedIntAndFloat>
+          <data>1</data>
+          <min>1.9</min>
+          <max>2.5</max>
+          <isInRangeInc>false</isInRangeInc>

Review Comment:
   Nice test :+1:, if we converged the types wrong there's a good chance `1.9` 
would be cast to int, become `1`, and `isInRangeInc` would be true and the test 
would fail.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dpath/Expression.scala:
##########
@@ -3041,6 +2952,56 @@ case class DFDLCheckConstraintsExpr(
 
 }
 
+case class DFDLCheckRangeExpr(
+  nameAsParsed: String,
+  fnQName: RefQName,
+  args: List[Expression],
+  isExclusive: Boolean,
+) extends FunctionCallBase(nameAsParsed, fnQName, args) {
+
+  lazy val List(arg1, arg2, arg3) = { checkArgCount(3); args }
+
+  override lazy val children = args
+
+  override lazy val compiledDPath = {
+    val argDPath = arg1.compiledDPath
+    val rangeFrom = arg2.compiledDPath
+    val rangeTo = arg3.compiledDPath
+    val c = conversions
+    val (_, _, lt, le, _, _) = ComparisonOps.forType(targetTypeForSubExpr)
+    val compare = if (isExclusive) lt else le
+    val res = new CompiledDPath(DFDLCheckRange(argDPath, rangeFrom, rangeTo, 
compare) +: c)
+    res
+  }
+
+  // we set the calculation to a lazy val so it's evaluated only once for each
+  // call to targetTypeForSubexpression since we don't care about the specific
+  // value of subexpr in targetTypeForSubexpression, and we need all 3 sub
+  // expressions to come to the right target target for the args
+  private lazy val targetTypeForSubExpr: NodeInfo.Kind = {

Review Comment:
   I think in places where we do something like this (i.e. figure out a common 
type for all arguments to convert to) we call this `convergedArgType`. We 
should do the same here for consistentcy.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dpath/Expression.scala:
##########
@@ -258,116 +258,21 @@ case class ComparisonExpression(op: String, adds: 
List[Expression])
   with BooleanExpression {
 
   lazy val compareOp: CompareOpBase = {
-    import NodeInfo.PrimType._
-    import NodeInfo.ArrayIndex
-    (op, convergedArgType) match {
-      case ("<", _) => subsetError("Unsupported operation '%s'. Use 'lt' 
instead.", op)
-      case (">", _) => subsetError("Unsupported operation '%s'. Use 'gt' 
instead.", op)
-      case ("<=", _) => subsetError("Unsupported operation '%s'. Use 'le' 
instead.", op)
-      case (">=", _) => subsetError("Unsupported operation '%s'. Use 'ge' 
instead.", op)
-      case ("=", _) => subsetError("Unsupported operation '%s'. Use 'eq' 
instead.", op)
-      case ("!=", _) => subsetError("Unsupported operation '%s'. Use 'ne' 
instead.", op)
-
-      case ("eq", HexBinary) => EQ_CompareByteArray
-      case ("ne", HexBinary) => NE_CompareByteArray
-      case ("eq", _) => EQ_Compare
-      case ("ne", _) => NE_Compare
-
-      case ("lt", Boolean) => LT_Boolean
-      case ("gt", Boolean) => GT_Boolean
-      case ("le", Boolean) => LE_Boolean
-      case ("ge", Boolean) => GE_Boolean
-
-      case ("lt", Date) => LT_Date
-      case ("gt", Date) => GT_Date
-      case ("le", Date) => LE_Date
-      case ("ge", Date) => GE_Date
-
-      case ("lt", Time) => LT_Time
-      case ("gt", Time) => GT_Time
-      case ("le", Time) => LE_Time
-      case ("ge", Time) => GE_Time
-
-      case ("lt", DateTime) => LT_DateTime
-      case ("gt", DateTime) => GT_DateTime
-      case ("le", DateTime) => LE_DateTime
-      case ("ge", DateTime) => GE_DateTime
-
-      case ("lt", String) => LT_String
-      case ("gt", String) => GT_String
-      case ("le", String) => LE_String
-      case ("ge", String) => GE_String
-
-      case ("lt", Decimal) => LT_Decimal
-      case ("gt", Decimal) => GT_Decimal
-      case ("le", Decimal) => LE_Decimal
-      case ("ge", Decimal) => GE_Decimal
-
-      case ("lt", Integer) => LT_Integer
-      case ("gt", Integer) => GT_Integer
-      case ("le", Integer) => LE_Integer
-      case ("ge", Integer) => GE_Integer
-
-      case ("lt", NonNegativeInteger) => LT_NonNegativeInteger
-      case ("gt", NonNegativeInteger) => GT_NonNegativeInteger
-      case ("le", NonNegativeInteger) => LE_NonNegativeInteger
-      case ("ge", NonNegativeInteger) => GE_NonNegativeInteger
-
-      case ("lt", UnsignedLong) => LT_UnsignedLong
-      case ("gt", UnsignedLong) => GT_UnsignedLong
-      case ("le", UnsignedLong) => LE_UnsignedLong
-      case ("ge", UnsignedLong) => GE_UnsignedLong
-
-      case ("lt", Long) => LT_Long
-      case ("gt", Long) => GT_Long
-      case ("le", Long) => LE_Long
-      case ("ge", Long) => GE_Long
-
-      case ("lt", UnsignedInt) => LT_UnsignedInt
-      case ("gt", UnsignedInt) => GT_UnsignedInt
-      case ("le", UnsignedInt) => LE_UnsignedInt
-      case ("ge", UnsignedInt) => GE_UnsignedInt
-
-      case ("lt", ArrayIndex) => LT_UnsignedInt
-      case ("gt", ArrayIndex) => GT_UnsignedInt
-      case ("le", ArrayIndex) => LE_UnsignedInt
-      case ("ge", ArrayIndex) => GE_UnsignedInt
-
-      case ("lt", Int) => LT_Int
-      case ("gt", Int) => GT_Int
-      case ("le", Int) => LE_Int
-      case ("ge", Int) => GE_Int
-
-      case ("lt", UnsignedShort) => LT_UnsignedShort
-      case ("gt", UnsignedShort) => GT_UnsignedShort
-      case ("le", UnsignedShort) => LE_UnsignedShort
-      case ("ge", UnsignedShort) => GE_UnsignedShort
-
-      case ("lt", Short) => LT_Short
-      case ("gt", Short) => GT_Short
-      case ("le", Short) => LE_Short
-      case ("ge", Short) => GE_Short
-
-      case ("lt", UnsignedByte) => LT_UnsignedByte
-      case ("gt", UnsignedByte) => GT_UnsignedByte
-      case ("le", UnsignedByte) => LE_UnsignedByte
-      case ("ge", UnsignedByte) => GE_UnsignedByte
-
-      case ("lt", Byte) => LT_Byte
-      case ("gt", Byte) => GT_Byte
-      case ("le", Byte) => LE_Byte
-      case ("ge", Byte) => GE_Byte
-
-      case ("lt", Float) => LT_Float
-      case ("gt", Float) => GT_Float
-      case ("le", Float) => LE_Float
-      case ("ge", Float) => GE_Float
-
-      case ("lt", Double) => LT_Double
-      case ("gt", Double) => GT_Double
-      case ("le", Double) => LE_Double
-      case ("ge", Double) => GE_Double
-
+    val (eq, ne, lt, le, gt, ge) = ComparisonOps.forType(convergedArgType)
+
+    op match {
+      case "<" => subsetError("Unsupported operation '%s'. Use 'lt' instead.", 
op)
+      case ">" => subsetError("Unsupported operation '%s'. Use 'gt' instead.", 
op)
+      case "<=" => subsetError("Unsupported operation '%s'. Use 'le' 
instead.", op)
+      case ">=" => subsetError("Unsupported operation '%s'. Use 'ge' 
instead.", op)
+      case "=" => subsetError("Unsupported operation '%s'. Use 'eq' instead.", 
op)
+      case "!=" => subsetError("Unsupported operation '%s'. Use 'ne' 
instead.", op)
+      case "eq" => eq
+      case "ne" => ne
+      case "lt" => lt
+      case "gt" => gt
+      case "le" => le
+      case "ge" => ge
       case _ => subsetError("Unsupported operation '%s' on type %s.", op, 
convergedArgType)

Review Comment:
   I think we currently hit this subsetError at compile time for lt, le, gt, ge 
operations with xs:hexBinary, but I don't think we will anymore? Instead we 
will create the new LT_HexBinary, etc. classes at compile time and fail at 
runtime with an assertion. We probably want to keep the same subset behavior so 
it SDEs at compile time?
   
   So, can you add a check somewhere in this function that creates this subset 
error for hex binary and those operations so we get the compile time subset 
error instead of a runtime assertion?
   
   Also, I think this case probably wants to be an `Assert.invariantFailed` 
now, since this ComparisonExpression should only be created with one of these 
operators (see `def Comp` in DFDLExpressionParser.scala). If we ever hit this 
case, it's probably a bug.
   
   Also, should we create a ticket to implement the rest of the operators for 
hexBinary and can remove this subset error and support those operators for 
hexBinary? Maybe we wait until we drop support for Java 8 and can use 
[Arrays.compare()](https://docs.oracle.com/javase%2F9%2Fdocs%2Fapi%2F%2F/java/util/Arrays.html#compare-byte:A-byte:A-)
 so it should be a trivial change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions [daffodil]

Reply via email to