[
https://issues.apache.org/jira/browse/SPARK-49616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Pazeto Jr updated SPARK-49616:
-------------------------------------
Description:
When I'm reading some json payloads PySpark is changing the data even if I read
it as a StringType and I want this as a String because I don't want to have
each field as a column at this step. I just want to get this payload as String
as it is in payload/source file
Locally I'm using Spark 3.3 in Jupiter Notebook with Glue 4 image PySpark
version: 3.3.0+amzn.1.dev0
Here my payload/source (test.txt):
{code:java}
{"payload":{"points":1220000000}}
{"payload":{"count":1550554545.0}}
{"payload":{"points":125888002540.0, "count":1550554545.0}}
{"payload":{"name": "Roger", "count":55154111.0}}{code}
Here my code:
{code:java}
path = "/home/glue_user/workspace/jupyter_workspace/test/test.txt"
schema = StructType([StructField('payload', StringType(), True)])
my_df = spark.read.schema(schema).option("inferSchema", "false").json(path)
my_df.show(truncate=False){code}
Here the result where PySpark is setting the float number in scientific
notation, even when I read it as String.
{+}{+}{+}{+}
{code:java}
+------------------------------------------------+
|payload |
+------------------------------------------------+
|{"points":1220000000} |
|{"count":1.550554545E9} |
|{"points":1.2588800254E11,"count":1.550554545E9}|
|{"name":"Roger","count":5.5154111E7} |
+------------------------------------------------+ {code}
Why I can't simply have my data as it is? Why the final result is changed into
my string field and receive this scientific notation? i.e:
{quote}"count":1550554545.0
"count":1.550554545E9
{quote}
was:
When I'm reading some json payloads PySpark is changing the data even if I read
it as a StringType and I want this as a String because I don't want to have
each field as a column at this step. I just want to get this payload as String
as it is in payload/source file
Locally I'm using Spark 3.3 in Jupiter Notebook with Glue 4 image PySpark
version: 3.3.0+amzn.1.dev0
Here my payload/source (test.txt):
{"payload":\{"points":1220000000}}
\{"payload":{"count":1550554545.0}}
\{"payload":{"points":125888002540.0, "count":1550554545.0}}
\{"payload":{"name": "Roger", "count":55154111.0}}
Here my code:
path = "/home/glue_user/workspace/jupyter_workspace/test/test.txt"
schema = StructType([StructField('payload', StringType(), True)])
my_df = spark.read.schema(schema).option("inferSchema", "false").json(path)
my_df.show(truncate=False)
Here the result where PySpark is setting the float number in scientific
notation, even when I read it as String.
+------------------------------------------------+
|payload |
+------------------------------------------------+
|\{"points":1220000000} |
|\{"count":1.550554545E9} |
|\{"points":1.2588800254E11,"count":1.550554545E9}|
|\{"name":"Roger","count":5.5154111E7} |
+------------------------------------------------+
Why I can't simply have my data as it is? Why the final result is changed into
my string field and receive this scientific notation? i.e:
{quote}"count":1550554545.0
"count":1.550554545E9
{quote}
> Spark reading data in scientific notation in String field
> ---------------------------------------------------------
>
> Key: SPARK-49616
> URL: https://issues.apache.org/jira/browse/SPARK-49616
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 3.1.0, 3.3.0
> Reporter: Daniel Pazeto Jr
> Priority: Major
>
> When I'm reading some json payloads PySpark is changing the data even if I
> read it as a StringType and I want this as a String because I don't want to
> have each field as a column at this step. I just want to get this payload as
> String as it is in payload/source file
> Locally I'm using Spark 3.3 in Jupiter Notebook with Glue 4 image PySpark
> version: 3.3.0+amzn.1.dev0
> Here my payload/source (test.txt):
> {code:java}
> {"payload":{"points":1220000000}}
> {"payload":{"count":1550554545.0}}
> {"payload":{"points":125888002540.0, "count":1550554545.0}}
> {"payload":{"name": "Roger", "count":55154111.0}}{code}
> Here my code:
> {code:java}
> path = "/home/glue_user/workspace/jupyter_workspace/test/test.txt"
> schema = StructType([StructField('payload', StringType(), True)])
> my_df = spark.read.schema(schema).option("inferSchema", "false").json(path)
> my_df.show(truncate=False){code}
> Here the result where PySpark is setting the float number in scientific
> notation, even when I read it as String.
> {+}{+}{+}{+}
> {code:java}
> +------------------------------------------------+
> |payload |
> +------------------------------------------------+
> |{"points":1220000000} |
> |{"count":1.550554545E9} |
> |{"points":1.2588800254E11,"count":1.550554545E9}|
> |{"name":"Roger","count":5.5154111E7} |
> +------------------------------------------------+ {code}
> Why I can't simply have my data as it is? Why the final result is changed
> into my string field and receive this scientific notation? i.e:
> {quote}"count":1550554545.0
> "count":1.550554545E9
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]