Jason Ferrell created ZEPPELIN-4009:

             Summary: Large Numbers Truncated
                 Key: ZEPPELIN-4009
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4009
             Project: Zeppelin
          Issue Type: Bug
          Components: build
    Affects Versions: 0.8.0
         Environment: %pyspark

from pyspark.sql.functions import *
from pyspark.sql.types import *

sfTestValue = StructField("testValue",StringType(), True)
schemaTest = StructType([sfTestValue])

listTestValues = []

dfTest = spark.createDataFrame(listTestValues, schemaTest)

dfTestExpanded = dfTest.selectExpr(\"testValue as idAsString",\"cast(testValue 
as bigint) as idAsBigint",\"cast(testValue as long) as idAsLong")

dfTestExpanded.show() ##This will show three columns of data correctly.


sqlContext.sql('select * from global_temp.testTable').show(3) ##shows truncated 
            Reporter: Jason Ferrell

(Copied from Apache Spark issue 26693 as it appears to be a Zeppelin issue 
rather than Spark)

We have a process that takes a file dumped from an external API and formats it 
for use in other processes.  These API dumps are brought into Spark with all 
fields read in as strings.  One of the fields is a 19 digit visitor ID.  Since 
implementing Spark 2.4 a few weeks ago, we have noticed that dataframes read 
the 19 digits correctly but any function in SQL appears to truncate the last 
two digits and replace them with "00".  

Our process is set up to convert these numbers to bigint, which worked before 
Spark 2.4.  We looked into data types, and the possibility of changing to a 
"long" type with no luck.  At that point we tried bringing in the string value 
as is, with the same result.  I've added code that should replicate the issue 
with a few 19 digit test cases and demonstrating the type conversions I tried.

This message was sent by Atlassian JIRA

Reply via email to