Re: DPath arithmetic conversions and overflow/overflow

Steve Lawrence Mon, 06 Oct 2025 06:21:31 -0700

DPath arithmetic where this logic takes place is always local. In fact, prettymuch all of Daffodil logic happens locally in a JVM that hides endianess issues.The only thing that could really cross a platform boundary is the infoset, butthat's only after we hand it off to the user, so we kindof leave it up to usersto deal with those kinds of issues.


On 2025-10-06 09:04 AM, Sood, Harinder wrote:

If everything is int32 then going across platform boundaries makes it less 
prone to endian issues.


Sincerely,
  Harinder Sood


  Senior Program Manager
   hs...@owlcyberdefense.com
   240 805 4219
   owlcyberdefense.com

The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately.

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Monday, October 6, 2025 9:01 AM
To: dev@daffodil.apache.org
Subject: Re: DPath arithmetic conversions and overflow/overflow

Makes sense. I took a look at what Saxon's XPath implementation does and it 
looks like they promote things to longs for arithmetic, even int arithamtic. So 
the likelihood of arithmetic overflow is pretty low.

This feels like the right approach and is similar to what you suggest, just 
promoting to long instead of int. And I imagine any performance differences 
between long and int is probably minimal on modern systems.


On 2025-10-03 01:23 PM, Mike Beckerle wrote:

I also would be hesitant to cast every to to xs:integer, since our
implementation backs that with java.math.BigInteger. I would guess
there's a performance hit from switching to primitive types to
BigInteger. Not sure if that would be enough to notice though,
especially since DPath epressions aren't usually that common.

I agree promoting everything to BigInteger has performance
implications I don't like.
These are all boxed numbers inside Daffodil, but still a BigInteger is
more expensive than a boxed primitive number.


There's also the consideration that if we cast everything to
xs:integer then we

still need to downcast to the expected resulting type, e.g.:


<element name="foo" type="xs:short"

     dfdl:inputValueCalc="{ ../short * ../short }" />


We could add an implicit downcast to the result of the expression, and
maybe

overflow is just considered an error in that case?

Whether we convert to xs:integer or xs:int (Java style) or do
promotion to next bigger size (int * int => long) (byte * byte =>
short) you'd still need to insert a downcast in this situation of
short * short result type going into an element of type short.

If we insert that automatically, then that would be compatible with
behavior today. If that downcast causes a runtime error, that's expected.

The change in behavior would be intermediate results inside an expression.
Ex: an expression like a * b + c and they are all shorts, and a * b
overflows a short, that is incorrect behavior. We really do want a * b
to create an int, which is an incompatible change in behavior for the
case where a * b causes overflow, or the "+ c" causes overflow.

Though as incompatibilities go, I expect this one is very rarely hit.







On Thu, Oct 2, 2025 at 8:54 AM Steve Lawrence <slawre...@apache.org> wrote:

I couldn't find that phrasing about casting to an xs:integer in the spec.
Maybe
AI hallucinated?

I did find this in the spec in Section B.1:

Note that type promotion is different from subtype substitution. For

example:


      A function that expects a parameter $p of type xs:float can be

invoked with a value of type xs:decimal. This is an example of type
promotion. The value is actually converted to the expected type.
Within the body of the function, $p instance of xs:decimal returns false.


      A function that expects a parameter $p of type xs:decimal can
be

invoked with a value of type xs:integer. This is an example of
subtype substitution. The value retains its original type. Within the
body of the function, $p instance of xs:integer returns true.

And here's the definition of subtype substitution:

[Definition: The use of a value whose dynamic type is derived from
an

expected type is known as subtype substitution.] Subtype substitution
does not change the actual type of a value. For example, if an
xs:integer value is used where an xs:decimal value is expected, the
value retains its type as xs:integer.

In the case of an xs:short being passed into a function that expects
an xs:integer, that sounds like it would just be subtype
substitution, so we would not cast the xs:short to an xs:integer, and
inside the function the type is treated as an xs:short. But the spec
isn't clear to me if that implies the result is also an xs:short or
if that is cast to something. It feels like keeping it a short as
very likely to run into overflow/underflow.

I also would be hesitant to cast every to to xs:integer, since our
implementation backs that with java.math.BigInteger. I would guess
there's a performance hit from switching to primitive types to
BigInteger. Not sure if that would be enough to notice though,
especially since DPath epressions aren't usually that common.

There's also the consideration that if we cast everything to
xs:integer then we still need to downcast to the expected resulting
type, e.g.:

<element name="foo" type="xs:short"
     dfdl:inputValueCalc="{ ../short * ../short }" />

We could add an implicit downcast to the result of the expression,
and maybe overflow is just considered an error in that case?



On 2025-10-01 05:17 PM, Mike Beckerle wrote:

Ok, I looked at this and got some AI coaching....

The phrase in the XPath spec says:

     "If both operands are of type xs:integer or are derived from

xs:integer,

then the operands are cast to xs:integer and the result is an

xs:integer."


This is explicit about operands being derived from xs:integer in
that

part,

but when it says they are cast, it doesn't qualify that in any way,
so I think the right interpretation of this is that they are cast to
exactly

the

xs:integer type.

ChatGPT agrees:  " XPath and XQuery .. deliberately avoid
proliferating narrow integer subtypes in arithmetic results.
Instead, the specification
says:

      -

      For + - * div idiv mod, if both operands are subtypes of xs:integer,
      they are *promoted to xs:integer*, not kept at the narrower type.
      -

      That way, all arithmetic on integer subtypes collapses to

xs:integer."


-
- Now, admittedly, I wrote a bunch of that code, and my thought
would not have been to do that lazy thing of just casting everything to xs:

integer.

- Rather I would have wanted promotion to have been to the least
common supertype for addition and multiplicatoin, and promotion to
just the

larger

of the two arg types for division and subtraction of unsigned types.
(Subtraction of signed types has to be treated like addition).
-
- So probably if we just did this promotion right the problem
wouldn't occur.
-
- Certainly having short + short create short is a bug.
-
- I am wondering if I made the mistake of taking *least** upper
bound* of the arg types, not least common supertype. The least upper
bound of X

and X

is, well X.
-
-

On Wed, Oct 1, 2025 at 2:18 PM Steve Lawrence <slawre...@apache.org>

wrote:

I'm trying to fix https://issues.apache.org/jira/browse/DAFFODIL-2574.
The core
issue is that Java arithmetic operations return Int, even if for
example you are adding two Shorts. Our DPath implementation doesn't
expect that, and assumes xs:short + xs:short always result in an
xs:short, that way it knows all the types at compile time and can
put in appropriate conversions.

I was looking through the Xpath/XQuery spec to figure what the
corret behavior is, and it feels kindof ambiguous. It defines
arithmetic functions like:

op:numeric-add($arg1 as numeric, $arg2 as numeric) as numeric

But it doesn't really say what the resulting numeric should be. It

really

just says

op:operation(xs:integer, xs:integer)

should return "xs:integer", but it's not completely clear if that's

saying

the
result should be promoted to an xs:integer, or the result just
should derive xs:integer. The later is my interpretation,
suggesting we should not promote, and I think is what DPath
intends.

But that then has issues with underflow/overflow--what happens when
a short + short doesn't fit into a short. Do we promote to an int?
Do we error.

The

spec
does say this regarding overflow underflow:

For xs:integer operations, implementations that support

limited-precision

integer operations ·must· select from the following options:
They ·may· choose to always raise an error [err:FOAR0002].
They ·may· provide an ·implementation-defined· mechanism that
allows

users

to
choose between raising an error and returning a result that is
modulo

the

largest representable integer value. See [ISO 10967].

So we could just detect overflow and error, but that feels like

short/byte

operations are likely to overflow. Which might break usability, but
it might detect cases people weren't expecting?

Or we could do what Java does and just promote arithmetic
operations to Int, which is likely to just do the right thing and
not overflow. But does me you would likely need to add downcasts
that might not be expected,e.g.

      <element name="foo" type="xs:short"
        dfdl:inputValueCalc="{ xs:short(../short1 + ../short2) }" />

In order for DPath to work the way it does, I think we do need to
make a compile time decision, I don't think DPath really wants to
promote things at runtime to whatever type fits the arithmetic
result and just assume everything is a Numeric. But I guess that
could be an option too, and there just might

be

little
bit of runtime overhead to check types and arithmetic results.

Thoughts?

Re: DPath arithmetic conversions and overflow/overflow

Reply via email to