Jay Pranavamurthi created SPARK-26308:
-----------------------------------------
Summary: Large BigDecimal value is converted to null when passed
into a UDF
Key: SPARK-26308
URL: https://issues.apache.org/jira/browse/SPARK-26308
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.0
Reporter: Jay Pranavamurthi
We are loading a Hive table into a Spark DataFrame. The Hive table has a
decimal(30, 0) column with values greater than Long.MAX_VALUE. The DataFrame
loads correctly.
We then use a UDF to convert the decimal type to a String value. For decimal
values < Long.MAX_VALUE, this works fine, but when the decimal value >
Long.MAX_VALUE, the input to the UDF is a *null*.
Hive table schema and data:
{code:java}
create table decimal_test (col1 decimal(30, 0), col2 decimal(10, 0), col3 int,
col4 string);
insert into decimal_test values(2011000000000002456556, 123456789, 10, 'test1');
{code}
Execution in spark-shell:
_(Note that the first column in the final output is null, it should have been
"2011000000000002456556")_
{code:java}
scala> val df1 = spark.sqlContext.sql("select * from decimal_test")
df1: org.apache.spark.sql.DataFrame = [col1: decimal(30,0), col2: decimal(10,0)
... 2 more fields]
scala> df1.show
+--------------------+---------+----+-----+
| col1| col2|col3| col4|
+--------------------+---------+----+-----+
|20110000000000024...|123456789| 10|test1|
+--------------------+---------+----+-----+
scala> val decimalToString = (value: java.math.BigDecimal) => if (value ==
null) null else { value.toBigInteger().toString }
decimalToString: java.math.BigDecimal => String = <function1>
scala> val udf1 = org.apache.spark.sql.functions.udf(decimalToString)
udf1: org.apache.spark.sql.expressions.UserDefinedFunction =
UserDefinedFunction(<function1>,StringType,Some(List(DecimalType(38,18))))
scala> val df2 = df1.withColumn("col1", udf1(df1.col("col1")))
df2: org.apache.spark.sql.DataFrame = [col1: string, col2: decimal(10,0) ... 2
more fields]
scala> df2.show
+----+---------+----+-----+
|col1| col2|col3| col4|
+----+---------+----+-----+
|null|123456789| 10|test1|
+----+---------+----+-----+
{code}
Oddly this works if we change the "decimalToString" udf to take an "Any"
instead of a "java.math.BigDecimal"
{code:java}
scala> val decimalToString = (value: Any) => if (value == null) null else { if
(value.isInstanceOf[java.math.BigDecimal])
value.asInstanceOf[java.math.BigDecimal].toBigInteger().toString else null }
decimalToString: Any => String = <function1>
scala> val udf1 = org.apache.spark.sql.functions.udf(decimalToString)
udf1: org.apache.spark.sql.expressions.UserDefinedFunction =
UserDefinedFunction(<function1>,StringType,None)
scala> val df2 = df1.withColumn("col1", udf1(df1.col("col1")))
df2: org.apache.spark.sql.DataFrame = [col1: string, col2: decimal(10,0) ... 2
more fields]
scala> df2.show
+--------------------+---------+----+-----+
| col1| col2|col3| col4|
+--------------------+---------+----+-----+
|20110000000000024...|123456789| 10|test1|
+--------------------+---------+----+-----+
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]