Re: [PR] Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions [daffodil]

via GitHub Fri, 29 Mar 2024 05:21:56 -0700


stevedlawrence commented on code in PR #1199:
URL: https://github.com/apache/daffodil/pull/1199#discussion_r1544395957



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dpath/Expression.scala:
##########
@@ -3041,6 +3047,101 @@ case class DFDLCheckConstraintsExpr(
 
 }
 
+case class DFDLCheckRangeInclusiveExpr(
+  nameAsParsed: String,
+  fnQName: RefQName,
+  args: List[Expression],
+) extends FunctionCallBase(nameAsParsed, fnQName, args) {
+
+  lazy val List(arg1, arg2, arg3) = { checkArgCount(3); args }
+
+  override lazy val children = args
+
+  override lazy val compiledDPath = {
+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath
+    val c = conversions
+    val res = new CompiledDPath(DFDLCheckRangeInclusive(argDPath, rangeFrom, 
rangeTo) +: c)
+    res
+  }
+  override def targetTypeForSubexpression(subexpr: Expression): NodeInfo.Kind 
= {
+    val targetType = (arg1.inherentType, arg2.inherentType, arg3.inherentType) 
match {
+      case (
+            testType: NodeInfo.Numeric.Kind,
+            minType: NodeInfo.Numeric.Kind,
+            maxType: NodeInfo.Numeric.Kind,
+          ) => {
+        val rangeType = NodeInfoUtils.generalizeArgTypesForComparisonOp(
+          "checkRangeInclusive",
+          minType,
+          maxType,
+        )
+        val targetType = NodeInfoUtils.generalizeArgTypesForComparisonOp(
+          "checkRangeInclusive",
+          testType,
+          rangeType,
+        )
+        targetType
+      }
+      case (test, min, max) =>
+        SDE(s"Cannot call $nameAsParsed with non-numeric types: $test, $min, 
$max")
+    }
+    targetType
+  }
+
+  override lazy val inherentType: NodeInfo.Kind = NodeInfo.Boolean
+
+}
+
+case class DFDLCheckRangeExclusiveExpr(
+  nameAsParsed: String,
+  fnQName: RefQName,
+  args: List[Expression],
+) extends FunctionCallBase(nameAsParsed, fnQName, args) {
+
+  lazy val List(arg1, arg2, arg3) = { checkArgCount(3); args }
+
+  override lazy val children = args
+
+  override lazy val compiledDPath = {
+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath
+    val c = conversions
+    val res = new CompiledDPath(DFDLCheckRangeExclusive(argDPath, rangeFrom, 
rangeTo) +: c)

Review Comment:
   DFDLCheckRangeExclusiveExpr is exactly the same as the Inclusive version 
except for this one line. What are your thoughts on only having a single 
`DFDLCheckRangeExpr` that takes a boolean parameter (e.g. `isExcsluve`) that 
signifies inclusive vs exclusive? Then, using a suggestion below you could have 
a single `DFDLCheckRange` class that accepts min/max comparision operations, 
something like:
   
   ```scala
   val (_, _, lt, le, _) = ComparisonOpts.forType(targetType)
   val compare = if (isExclusive) lt else le
   val res = new CompiledDPath(DFDLCheckRange(argDPath, rangeFrom, rangeTo, 
compare) +: c)
   ```
   Then we can combine DFDLCheckRangeInclusive/Exclusive to a single 
`DFDLCheckRange` class that does something like:
   ```scala
   def run(...) {
     ...
     val res = compare.operate(minValue, dataValue) && 
compare.operate(dataValue, maxValue)
     res
   }
   ```
   
   The benefit to this is approach is we avoid lots of duplication and we also 
figure out the right comparison logic/function at schema compile time instead 
of run-time.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/DFDLFunctions.scala:
##########
@@ -39,6 +42,94 @@ case class DFDLCheckConstraints(recipe: CompiledDPath) 
extends RecipeOpWithSubRe
   }
 }
 
+case class DFDLCheckRangeInclusive(
+  dataRecipe: CompiledDPath,
+  minRecipe: CompiledDPath,
+  maxRecipe: CompiledDPath,
+) extends RecipeOpWithSubRecipes(dataRecipe, minRecipe, maxRecipe) {
+  override def run(dstate: DState): Unit = {
+    val saved = dstate.currentNode
+    dataRecipe.run(dstate)
+    val dataVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    minRecipe.run(dstate)
+    val minVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    maxRecipe.run(dstate)
+    val maxVal = dstate.currentValue
+
+    val res = executeCheck(
+      dataVal: DataValue.DataValuePrimitiveNullable,
+      minVal: DataValue.DataValuePrimitiveNullable,
+      maxVal: DataValue.DataValuePrimitiveNullable,
+    )
+    dstate.setCurrentValue(res)
+  }
+
+  def executeCheck(
+    dataVal: DataValue.DataValuePrimitiveNullable,
+    minVal: DataValue.DataValuePrimitiveNullable,
+    maxVal: DataValue.DataValuePrimitiveNullable,
+  ): Boolean = {
+    (dataVal.value, minVal.value, maxVal.value) match {
+      case (data: Integer, min: Integer, max: Integer) => data >= min && data 
<= max
+      case (data: java.lang.Double, min: java.lang.Double, max: 
java.lang.Double) =>
+        data >= min && data <= max
+      case (data: java.lang.Float, min: java.lang.Float, max: java.lang.Float) 
=>
+        data >= min && data <= max
+      case (data: java.math.BigDecimal, min: java.math.BigDecimal, max: 
java.math.BigDecimal) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0
+      case (data: BigInteger, min: BigInteger, max: BigInteger) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0

Review Comment:
   This match case isn't exhaustive. It's missing Short, Byte, and Long.
   
   Though, instead of doing the type check at runtime and duplicating a bunch 
of checks, what if we figure out the right 
[ComparisonOps](https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/ComparisonOps.scala)
 to use at compile time?
   
   I suggest we create a new static function, something like
   
   ```scala
   object ComparisonOps {
     def forType(type: NodeInfo.Kind): (CompareOpBase, CompareOpBase, 
CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase) = {
       type match {
         case PrimType.Boolean => (EQ_Compare, NE_Compare, LT_Boolean, 
LE_Boolean, GT_Boolean, GE_Boolean)
         case PrimType.Integer => (EQ_Compare, NE_Compare, LT_Integer, 
LE_Integer, GT_Integer, GE_Integer)
         ...
       }
     }
     ...
   }
   ```
   So it returns a 6-tuple of all the comparison functions that might be needed 
for a specific type. We can use this anywhere we need comparison functions, 
like in the `DFDLCheckRangeExpr` suggested above. We could also use it in 
`ComparisonExpression`. That probably wants to be something like this:
   
   ```
   lazy val compareOp = {
     val (eq, ne, lt, le, gt, ge) = CopmarisonOps.forType(convertedArgType)
     op match {
       ...
       case "eq"=> eq
       case "ne" => ne
       case "lt" => lt
       ...
     }
   }
   ```
   Note that all the comparison functions used in `ComparisionExpression` 
should be used in the new `forType` function.
   
   We could also could use it in `repTypeComparers`, e.g.
   
   ```scala
   lazy val (
     repTypeCompareLT: NumberCompareOp,
     repTypeCompareLE: NumberCompareOp,
   ) = LV('repTypeComparers) {
     val (_, _, lt, le, _, _) = 
ComparisonOps.forType(repTypeElementDec.primType)
     (lt, le)
   }
   ```
   
   There might be other places where we have a match/case to figure out 
comparison ops that could be simplified in this way an remove some duplication.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dpath/Expression.scala:
##########
@@ -3041,6 +3047,101 @@ case class DFDLCheckConstraintsExpr(
 
 }
 
+case class DFDLCheckRangeInclusiveExpr(
+  nameAsParsed: String,
+  fnQName: RefQName,
+  args: List[Expression],
+) extends FunctionCallBase(nameAsParsed, fnQName, args) {
+
+  lazy val List(arg1, arg2, arg3) = { checkArgCount(3); args }
+
+  override lazy val children = args
+
+  override lazy val compiledDPath = {
+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath

Review Comment:
   Also, if you reference arg1/arg2/arg3 it will force evaluation of line 3056 
and call checkArgCount(3), so you can also remove the duplicate check on 3061.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/DFDLFunctions.scala:
##########
@@ -39,6 +42,94 @@ case class DFDLCheckConstraints(recipe: CompiledDPath) 
extends RecipeOpWithSubRe
   }
 }
 
+case class DFDLCheckRangeInclusive(
+  dataRecipe: CompiledDPath,
+  minRecipe: CompiledDPath,
+  maxRecipe: CompiledDPath,
+) extends RecipeOpWithSubRecipes(dataRecipe, minRecipe, maxRecipe) {
+  override def run(dstate: DState): Unit = {
+    val saved = dstate.currentNode
+    dataRecipe.run(dstate)
+    val dataVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    minRecipe.run(dstate)
+    val minVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    maxRecipe.run(dstate)
+    val maxVal = dstate.currentValue
+
+    val res = executeCheck(
+      dataVal: DataValue.DataValuePrimitiveNullable,
+      minVal: DataValue.DataValuePrimitiveNullable,
+      maxVal: DataValue.DataValuePrimitiveNullable,
+    )
+    dstate.setCurrentValue(res)
+  }
+
+  def executeCheck(
+    dataVal: DataValue.DataValuePrimitiveNullable,
+    minVal: DataValue.DataValuePrimitiveNullable,
+    maxVal: DataValue.DataValuePrimitiveNullable,
+  ): Boolean = {
+    (dataVal.value, minVal.value, maxVal.value) match {
+      case (data: Integer, min: Integer, max: Integer) => data >= min && data 
<= max
+      case (data: java.lang.Double, min: java.lang.Double, max: 
java.lang.Double) =>
+        data >= min && data <= max
+      case (data: java.lang.Float, min: java.lang.Float, max: java.lang.Float) 
=>
+        data >= min && data <= max
+      case (data: java.math.BigDecimal, min: java.math.BigDecimal, max: 
java.math.BigDecimal) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0
+      case (data: BigInteger, min: BigInteger, max: BigInteger) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0
+      case (_, _, _) => false
+    }
+  }
+}
+
+case class DFDLCheckRangeExclusive(
+  dataRecipe: CompiledDPath,
+  minRecipe: CompiledDPath,
+  maxRecipe: CompiledDPath,
+) extends RecipeOpWithSubRecipes(dataRecipe, minRecipe, maxRecipe) {
+  override def run(dstate: DState): Unit = {
+    val saved = dstate.currentNode
+    dataRecipe.run(dstate)
+    val dataVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    minRecipe.run(dstate)
+    val minVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    maxRecipe.run(dstate)
+    val maxVal = dstate.currentValue

Review Comment:
   The `currentNode` DState can be used by a `run` implementation to know where 
to evaluate a relative path expression from (e.g. `../foo/bar/baz`).
   
   But calling `run` can also potentially mutate `currentNode`, changing it to 
something else, which can interfere with other calls to run, so it needs to be 
reset.
   
   For example, say we had something like this:
   
   ```
   dfdl:checkRangeExclusive(../foo/bar/baz, ../uno/dos/tres, ../one/two/three)`
   ```
   
   Before we call `run()`, let's say `currentNode` points to an element called 
`root`.
   
   After we call `dataRecipe.run()` which evaluates `../foo/bar/baz` relative 
to `root`, the `currentNode` will point the `baz` element, because DState was 
mutated. If we do not reset `currentNode` back to `root`, then when we call 
`minRecipe.run()` to evaluate `../uno/dos/tres` that relative path will be 
evaluated relative to the `baz` element instead of the `root` element. So 
that's why we need to save and restore curentNode after each call to run.
   
   Note that we don't have to restore `currentNode` after calling 
`maxReipce.run()` because we aren't personally calling anymore runs, so 
whatever `currentNode` ended up as doesn't matter to us. If some other 
expression was called that led to this run() being called, then they were 
responsible for saving and restoring currentNode if they needed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions [daffodil]

Reply via email to