[
https://issues.apache.org/jira/browse/SPARK-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-18772:
---------------------------------
Description:
It looks we can avoid some cases for unnecessary conversion try in special
floats in JSON.
Also, we could support some other cases for them such as {{+INF}}, {{INF}} and
{{-INF}}.
For avoiding additional conversions, please refer the codes below:
{code}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> spark.read.schema(StructType(Seq(StructField("a",
DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a":
"nan"}""").toDS).show()
17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NumberFormatException: For input string: "nan"
...
{code}
was:
JacksonParser tests for infinite and NaN values in a way that is not supported
by the underlying float/double parser. For example, the input string is always
lowercased to check for {{-Infinity}} but the parser only supports titlecased
values. So a {{-infinitY}} will pass the test but fail with a
{{NumberFormatException}} when parsing. This exception is not caught anywhere
and the task ends up failing.
A related issue is that the code checks for {{Inf}} but the parser only
supports the long form of {{Infinity}}.
> Unnecessary conversion try and some missing cases for special floats in JSON
> ----------------------------------------------------------------------------
>
> Key: SPARK-18772
> URL: https://issues.apache.org/jira/browse/SPARK-18772
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.2
> Reporter: Nathan Howell
> Priority: Minor
>
> It looks we can avoid some cases for unnecessary conversion try in special
> floats in JSON.
> Also, we could support some other cases for them such as {{+INF}}, {{INF}}
> and {{-INF}}.
> For avoiding additional conversions, please refer the codes below:
> {code}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> spark.read.schema(StructType(Seq(StructField("a",
> DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a":
> "nan"}""").toDS).show()
> 17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
> java.lang.NumberFormatException: For input string: "nan"
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]