[
https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107365#comment-17107365
]
Rafael commented on SPARK-30100:
--------------------------------
Hey guys,
I encountered an issue related to the precision issues.
Now the code expects the for the Decimal type we need to have in JDBC metadata
precision and scale.
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]
I found out that in the OracleDB it is valid to have Decimal without these
data. When I do a query read metadata for such column I'm getting
DATA_PRECISION = Null, and DATA_SCALE = Null.
Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?
* [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869]
> Decimal Precision Inferred from JDBC via Spark
> ----------------------------------------------
>
> Key: SPARK-30100
> URL: https://issues.apache.org/jira/browse/SPARK-30100
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 2.4.0
> Reporter: Joby Joje
> Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be
> precision loss in the decimal field, as per my understanding Spark supports
> *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark
> rounds off the last four digits making it a precision of DECIMAL(38,10). This
> is happening to few fields in the dataframe where the column is fetched using
> a CASE statement whereas in the same query another field populates the right
> schema.
> Tried to pass the
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \
> .format("jdbc") \
> .option("url", "ORACLE") \
> .option("dbtable", "QUERY") \
> .option("user", "USERNAME") \
> .option("password", "PASSWORD") \
> .load(){code}
> So considering that the Spark infers the schema from a sample records, how
> does this work here? Does it use the results of the query i.e (SELECT * FROM
> TABLE_NAME JOIN ...) or does it take a different route to guess the schema
> for itself? Can someone throw some light on this and advise how to achieve
> the right decimal precision on this regards without manipulating the query as
> doing a CAST on the query does solve the issue, but would prefer to get some
> alternatives.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]