stevedlawrence commented on code in PR #911:
URL: https://github.com/apache/daffodil/pull/911#discussion_r1068154928
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -170,13 +279,23 @@ trait ConvertTextNumberMixin {
final protected def pattern = e.textNumberPattern
/**
- * Checks a pattern for suitability with V (virtual decimal point) in the
pattern
- * in the context of the corresponding textNumberRep. Computes number of
digits to right of the V.
+ * Analogous to the property dfdl:binaryDecimalVirtualPoint
+ *
+ * Value is 0 if there is no virtual decimal point.
+ *
+ * Value is the number of digits to the right of the 'V' for V patterns.
*
- * SDE if pattern has a syntax error.
+ * For P patterns, the value is the number of P characters when
+ * the P chars are on the right of the digits, or it is negated if the
+ * P chars are on the left of the digits.
*
- * @return the number of digits to the right of the V character. Must be 1
or greater.
+ * Examples:
*
+ * "000.00" returns 2, so text "12345" yields 123.45
Review Comment:
Should this be `000V00`?
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -80,14 +99,72 @@ object TextNumberPatternUtils {
// don't forget the ^ and $ (start of data, end of data) because we want
// the match to consume all the characters, starting at the beginning.
val vPattern=
-
s"""^${posPrefix}${posBeforeV}V${posAfterV}${posSuffix}(?:;${negPrefix}${negBeforeV}V${negAfterV}${negSuffix})?$$"""
+ s"""^${posPrefix}${posBeforeV}V${posAfterV}${posSuffix}""" ++
+ s"""(?:;${negPrefix}${negBeforeV}(?:V${negAfterV}${negSuffix})?)?$$"""
Review Comment:
I know this is just a long-line change, but I missed in the V PR. If there
is a negative pattern, them the `V`, after V, and suffix combined are optional?
So for example, this pattern is valid `+00V00;-00`. Is this intentional?
My reading of the spec, it seems the positive and negative patterns should
be the exact same pattern, just with the entire negative pattern being
optional. Almost something like
```scala
val subpattern = s"""${posPrefix}${posBeforeV}V${posAfterV}${posSuffix}"""
val vPattern = s"""^${subpattern}(?:;${subpattern})?$$"""
```
I noticed in the P pattern you don't allow that optionality.
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -275,17 +410,67 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
extends Terminal(e, true)
with ConvertTextNumberMixin {
+
final override protected lazy val textDecimalVirtualPointFromPattern: Int = {
- val r = TextNumberPatternUtils.vregexStandard
- r.findFirstMatchIn(patternStripped) match {
- case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
- case None =>
- e.SDE(
- s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
- | Other than the sign indicators, it can contain only
- | '#', then digits 0-9 then 'V' then digits 0-9.
- | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ if (hasV) {
+ val r = TextNumberPatternUtils.vRegexStandard
+ r.findFirstMatchIn(patternNoQuoted) match {
+ case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+ case None =>
+ e.SDE(
+ s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
+ | Other than the sign indicators, it can contain only
+ | '#', then digits 0-9 then 'V' then digits 0-9.
+ | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ pattern)
+ }
Review Comment:
Just trying to think about edge cases, assume a pattern is invalid and it
has both `V` and `P`, e.g. `P00V00`. In this case both `hasV` and `hasP` will
be true, and we'll fall into this `hasV` block, the regex will fail and we'll
say it's not a valid V pattern? Seems like a reasonable way to error, just
double checking that I'm understanding this all right.
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -275,17 +410,67 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
extends Terminal(e, true)
with ConvertTextNumberMixin {
+
final override protected lazy val textDecimalVirtualPointFromPattern: Int = {
- val r = TextNumberPatternUtils.vregexStandard
- r.findFirstMatchIn(patternStripped) match {
- case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
- case None =>
- e.SDE(
- s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
- | Other than the sign indicators, it can contain only
- | '#', then digits 0-9 then 'V' then digits 0-9.
- | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ if (hasV) {
+ val r = TextNumberPatternUtils.vRegexStandard
+ r.findFirstMatchIn(patternNoQuoted) match {
+ case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+ case None =>
+ e.SDE(
+ s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
+ | Other than the sign indicators, it can contain only
+ | '#', then digits 0-9 then 'V' then digits 0-9.
+ | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ pattern)
+ }
+ } else if (hasP) {
+ val rr = TextNumberPatternUtils.pOnRightRegexStandard
+ val rl = TextNumberPatternUtils.pOnLeftRegexStandard
+ val rightMatch = rr.findFirstMatchIn(patternNoQuoted)
+ val leftMatch = rl.findFirstMatchIn(patternNoQuoted)
+ (leftMatch, rightMatch) match {
+ case (None, None) => e.SDE(
+ """The dfdl:textNumberPattern '%s' contains 'P' (virtual decimal point
positioners).
+ |However, it did not match the allowed syntax which allows the
sign indicator
+ |plus digits on only one side of the P symbols.""".stripMargin,
pattern)
+ case (Some(rl(_, ps, digits, _, negPre, negPs, negDigits, _)), None)
=> {
+ checkPPPMatchSyntax(ps, digits, negPre, negPs, negDigits)
+ ps.length + digits.length
+ }
+ case (None, Some(rr(_, digits, ps, _, negPre, negDigits, negPs, _)))
=> {
+ checkPPPMatchSyntax(ps, digits, negPre, negPs, negDigits)
+ - ps.length // negate value.
+ }
+ case _ => Assert.invariantFailed("Should not match both left P and
right P regular expressions.")
+ }
+ } else {
+ 0 // there is no V nor P, so there is no virtual decimal point scaling
involved.
+ }
+ }
+
+ /**
+ * Check for common errors in the pattern matches for 'P' patterns.
+ *
+ * @param ps match string for positive pattern P characters
+ * @param digits match string for positive pattern digits (and sharps)
+ * @param negPre negative prefix match (null if there is no negative
pattern at all)
+ * @param negPs match string for negative pattern P character (or null)
+ * @param negDigits match string for negative pattern P digits (and sharps)
+ */
+ private def checkPPPMatchSyntax(ps: String, digits: String, negPre: String,
negPs: String, negDigits: String): Unit = {
+ Assert.invariant(ps.length >= 1)
+ digits.length >= 1
Review Comment:
Was this intended to be an `Assert(digits.length >= 1)`. I'm surprises Scala
didn't give a compiler warning about this line not doing anything. Maybe it
doesn't know if `.length` has side effects or not?
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ConvertTextStandardNumberParser.scala:
##########
@@ -89,21 +92,52 @@ trait TextDecimalVirtualPointMixin {
}
// $COVERAGE-ON$
+
+ /**
+ * Always creates an integer from a JFloat, JDouble, or JBigDecimal
+ * by scaling the argument by the virtual decimal point scaling factor.
+ *
+ * If the argument is some other JNumber, i.e., not a JFloat, JDouble, or
JBigDecimal
+ * it can only be an integer type. This method this returns the argument
value
Review Comment:
Should be "This method returns"
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -186,41 +305,57 @@ trait ConvertTextNumberMixin {
* but cannot be used as an operational pattern given that
* potentially lots of things have been removed from it.
*/
- final protected lazy val patternStripped = {
+ final protected lazy val patternNoQuoted = {
// note: tick == apos == ' == single quote.
// First remove entirely all escaped ticks ie., ''
val noEscapedTicksRegex = """''""".r
val patternNoEscapedTicks = noEscapedTicksRegex.replaceAllIn(pattern, "")
// Next remove all tick-escaped characters entirely
val noQuotedRegex = """'[^']+'""".r
- val patternNoQuoted = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
+ val res = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
// the remaining string contains only pattern special characters
// and regular non-pattern characters
- patternNoQuoted
+ res
}
- protected final lazy val hasV = patternStripped.contains("V")
- protected final lazy val hasP = patternStripped.contains("P")
+ protected final lazy val hasV = patternNoQuoted.contains("V")
+ protected final lazy val hasP = patternNoQuoted.contains("P")
/**
- * analogous to the property dfdl:binaryDecimalVirtualPoint
+ * Analogous to the property dfdl:binaryDecimalVirtualPoint
*
* Value is 0 if there is no virtual decimal point.
- * Value is the number of digits to the right of the 'V'
+ *
+ * Value is the number of digits to the right of the 'V' for V patterns.
+ *
+ * For P patterns, the value is the number of P characters when
+ * the P chars are on the right of the digits, or it is negated if the
+ * P chars are on the left of the digits.
+ *
+ * Examples:
+ *
+ * "000.00" returns 2, so text "12345" yields 123.45
Review Comment:
Same question here, should this be `000V00`?
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -186,41 +305,57 @@ trait ConvertTextNumberMixin {
* but cannot be used as an operational pattern given that
* potentially lots of things have been removed from it.
*/
- final protected lazy val patternStripped = {
+ final protected lazy val patternNoQuoted = {
// note: tick == apos == ' == single quote.
// First remove entirely all escaped ticks ie., ''
val noEscapedTicksRegex = """''""".r
val patternNoEscapedTicks = noEscapedTicksRegex.replaceAllIn(pattern, "")
// Next remove all tick-escaped characters entirely
val noQuotedRegex = """'[^']+'""".r
- val patternNoQuoted = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
+ val res = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
// the remaining string contains only pattern special characters
// and regular non-pattern characters
- patternNoQuoted
+ res
}
- protected final lazy val hasV = patternStripped.contains("V")
- protected final lazy val hasP = patternStripped.contains("P")
+ protected final lazy val hasV = patternNoQuoted.contains("V")
+ protected final lazy val hasP = patternNoQuoted.contains("P")
/**
- * analogous to the property dfdl:binaryDecimalVirtualPoint
+ * Analogous to the property dfdl:binaryDecimalVirtualPoint
*
* Value is 0 if there is no virtual decimal point.
- * Value is the number of digits to the right of the 'V'
+ *
+ * Value is the number of digits to the right of the 'V' for V patterns.
+ *
+ * For P patterns, the value is the number of P characters when
+ * the P chars are on the right of the digits, or it is negated if the
+ * P chars are on the left of the digits.
+ *
+ * Examples:
+ *
+ * "000.00" returns 2, so text "12345" yields 123.45
+ *
+ * "PP000" returns 2, so text "123" yields 0.00123
+ *
+ * "000PP" returns -2 so text "123" yields 12300.0
*/
final lazy val textDecimalVirtualPoint: Int = {
- if (!hasV && !hasP) 0 // no virtual point
- else {
- if (hasP) {
- e.notYetImplemented("textNumberPattern with P symbol")
- }
- Assert.invariant(hasV)
+ lazy val virtualPoint = textDecimalVirtualPointFromPattern
+ if (hasV) {
//
// check for things incompatible with "V"
//
- val virtualPoint = textDecimalVirtualPointFromPattern
Assert.invariant(virtualPoint >= 1) // if this fails the regex is broken.
virtualPoint
+ } else if (hasP) {
+ //
+ // check for things incompatible with "V"
Review Comment:
Should be `incompatiable with "P"` instead of "V"?
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -114,21 +191,66 @@ object TextNumberPatternUtils {
* Checks a pattern for suitability with V (virtual decimal point) in the
pattern
* in the context of zoned textNumberRep.
*
- * This is broken out separately for unit testing purposes.
+ * This method is here in this object for unit testing purposes.
*
* @param patternStripped the dfdl:textNumberPattern pattern - with all
quoting removed.
* @return None if the pattern is illegal syntax for use with V (virtual
decimal point)
* otherwise Some(N) where N is the number of digits to the right of
the V character.
*/
- private[primitives] def textDecimalVirtualPointForZoned(patternStripped:
String): Option[Int] = {
- val r = TextNumberPatternUtils.vregexZoned
+ def textNumber_V_DecimalVirtualPointForZoned(patternStripped: String):
Option[Int] = {
Review Comment:
Since this is no longer `private[primitives]`, does it make sense to move
the regex match logic to the `textDecimalVirtualPointFromPattern` val in
`PrimivitesZoned.scala`? That would make the different
`textDecimalVirtualPointFromPattern` implementations look more similar instead
of one calling this function and the other having the logic inline.
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -275,17 +410,67 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
extends Terminal(e, true)
with ConvertTextNumberMixin {
+
final override protected lazy val textDecimalVirtualPointFromPattern: Int = {
- val r = TextNumberPatternUtils.vregexStandard
- r.findFirstMatchIn(patternStripped) match {
- case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
- case None =>
- e.SDE(
- s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
- | Other than the sign indicators, it can contain only
- | '#', then digits 0-9 then 'V' then digits 0-9.
- | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ if (hasV) {
+ val r = TextNumberPatternUtils.vRegexStandard
+ r.findFirstMatchIn(patternNoQuoted) match {
+ case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+ case None =>
+ e.SDE(
+ s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal
point).
+ | Other than the sign indicators, it can contain only
+ | '#', then digits 0-9 then 'V' then digits 0-9.
+ | The positive part of the dfdl:textNumberPattern is
mandatory.""".stripMargin('|'),
+ pattern)
+ }
+ } else if (hasP) {
+ val rr = TextNumberPatternUtils.pOnRightRegexStandard
+ val rl = TextNumberPatternUtils.pOnLeftRegexStandard
+ val rightMatch = rr.findFirstMatchIn(patternNoQuoted)
+ val leftMatch = rl.findFirstMatchIn(patternNoQuoted)
+ (leftMatch, rightMatch) match {
+ case (None, None) => e.SDE(
+ """The dfdl:textNumberPattern '%s' contains 'P' (virtual decimal point
positioners).
+ |However, it did not match the allowed syntax which allows the
sign indicator
+ |plus digits on only one side of the P symbols.""".stripMargin,
pattern)
+ case (Some(rl(_, ps, digits, _, negPre, negPs, negDigits, _)), None)
=> {
+ checkPPPMatchSyntax(ps, digits, negPre, negPs, negDigits)
+ ps.length + digits.length
+ }
+ case (None, Some(rr(_, digits, ps, _, negPre, negDigits, negPs, _)))
=> {
+ checkPPPMatchSyntax(ps, digits, negPre, negPs, negDigits)
+ - ps.length // negate value.
+ }
+ case _ => Assert.invariantFailed("Should not match both left P and
right P regular expressions.")
+ }
+ } else {
+ 0 // there is no V nor P, so there is no virtual decimal point scaling
involved.
+ }
+ }
+
+ /**
+ * Check for common errors in the pattern matches for 'P' patterns.
+ *
+ * @param ps match string for positive pattern P characters
+ * @param digits match string for positive pattern digits (and sharps)
+ * @param negPre negative prefix match (null if there is no negative
pattern at all)
+ * @param negPs match string for negative pattern P character (or null)
+ * @param negDigits match string for negative pattern P digits (and sharps)
+ */
+ private def checkPPPMatchSyntax(ps: String, digits: String, negPre: String,
negPs: String, negDigits: String): Unit = {
+ Assert.invariant(ps.length >= 1)
+ digits.length >= 1
+ if (negPre ne null) {
+ e.schemaDefinitionUnless(ps == negPs,
+ """In the dfdl:textNumberPattern '%s' the P Symbols do not match
+ | between positive and negative patterns""".stripMargin,
+ pattern)
+ e.schemaDefinitionUnless(digits == negDigits,
+ """In the dfdl:textNumberPattern '%s' the '#' and digit symbols do not
match
+ | between positive and negative patterns""".stripMargin,
+ pattern)
}
Review Comment:
Looking back at the spec I saw this line:
> If there is an explicit negative subpattern, it serves only to specify the
negative prefix and suffix; the number of digits, minimal digits, and other
characteristics are ignored in the negative subpattern
Which gives me two questions:
1. Should we allow the negative and positive P's to be different? Seems like
it's probably a user doing something they didn't intend if that's the case so
maybe it's is worth an SDE as you have. Or at the very least this should be a
warning.
2. Similarly, should the V and P regex patterns be much more lax about what
they allow in the negative part since it's ignored? Again, if they don't match
it probably an accident and an SDE seems very reasonable, but maybe there are
schemas that will just want to keep the textNumberPattern as simple as possible
and do something like `+000V0000;-0` so that they can specific the virtual
decimal point in the postive pattern, and then just the prefix/suffix in the
negative pattern and not worry about them having to match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]