Dylan He created SPARK-48979:
--------------------------------
Summary: CONV function behaves inconsistently
Key: SPARK-48979
URL: https://issues.apache.org/jira/browse/SPARK-48979
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.1
Reporter: Dylan He
I'm currently working on CONV function, and I found something confused about
the implementation in Spark.
All codes below is from NumberConverter.scala.
h3. negative and signed situation
{code:sql}
spark-sql (default)> select conv('FFFFFFFFFFFFFFFE', 16, -16);
-2
spark-sql (default)> select conv('-FFFFFFFFFFFFFFFE', 16, -16);
-2
{code}
Ideally, these two queries should yield different results, but they both return
-2.
{code:java}
if (toBase < 0 && v < 0) {
v = -v
negative = true
}
{code}
According to code above, when toBase < 0 and v < 0, negative sign is set to
true regardless of the original value. This will lead to incorrect result as
the examples above, because negative sign is ignored in the second case. A
potential adjustment is negative = !negative, which would correctly interpret
the double negation and yield 2.
h3. ansi mode
{code:java}
if (negative && toBase > 0) {
if (v < 0) {
v = -1
} else {
v = -v
}
}
{code}
Here, -1 is used to indicate an overflow condition but does not throw an
exception when ANSI mode is enabled, unlike the overflow handling in the encode
method.
h3. overflow check
{code:java}
val bound = java.lang.Long.divideUnsigned(-1 - radix, radix)
if (v >= bound) {...}
{code}
The inclusion of the equality in the overflow check seems unnecessary.
----
I am still learning function in Spark. Please feel free to point out any
mistakes I might have. And some of these questions are also mentioned in
[SPARK-44943|https://issues.apache.org/jira/browse/SPARK-44943].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]