[ https://issues.apache.org/jira/browse/SPARK-24496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506001#comment-16506001 ]
SHAILENDRA SHAHANE commented on SPARK-24496: -------------------------------------------- This issue is still there . I tried to fetch data from MongoDB and got the following exception while converting the RDD to DF. -----------------Code -------------- SQLContext sparkSQLContext = spark.sqlContext(); DataFrameReader dfr = spark.read() .format("com.mongodb.spark.sql") .option("floatAsBigDecimal", "true"); Dataset<Row> rbkp = dfr.load(); ------------------ OR ------------------------ JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); JavaMongoRDD<Document> rdd = MongoSpark.load(jsc); Dataset<Row> rbkp = rdd.toDF(); -------------------- Spark version 2.3 MongoDB Version - 3.4 and 3.6 ----------------Data Sample------------- {"_id":"5b0d31f892549e10b61d962a","RSEG_MANDT":"800","RSEG_EBELN":"4500017749","RSEG_EBELP":"00020","RSEG_BELNR":"1000000001","RSEG_BUZEI":"000002","RSEG_GJAHR":"2013","RBKP_BUDAT":"2013-10-04","RSEG_MENGE":\{"$numberDecimal":"30.000"},"RSEG_LFBNR":"5000000472","RSEG_LFGJA":"2013","RSEG_LFPOS":"0002","NOT_ACCOUNT_MAINTENANCE":\{"$numberDecimal":"1.0000000000"},"RBKP_CPUTIMESTAMP":"2013-10-04T10:32:02.000Z","RBKP_WAERS":"USD","RSEG_BNKAN":\{"$numberDecimal":"0.00"},"RSEG_WRBTR":\{"$numberDecimal":"2340.00"},"RSEG_SHKZG":"S"} > CLONE - JSON data source fails to infer floats as decimal when precision is > bigger than 38 or scale is bigger than precision. > ----------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-24496 > URL: https://issues.apache.org/jira/browse/SPARK-24496 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: SHAILENDRA SHAHANE > Assignee: Hyukjin Kwon > Priority: Minor > Fix For: 2.0.0 > > Attachments: SparkJiraIssue08062018.txt > > > Currently, JSON data source supports {{floatAsBigDecimal}} option, which > reads floats as {{DecimalType}}. > I noticed there are several restrictions in Spark {{DecimalType}} below: > 1. The precision cannot be bigger than 38. > 2. scale cannot be bigger than precision. > However, with the option above, it reads {{BigDecimal}} which does not follow > the conditions above. > This could be observed as below: > {code} > def simpleFloats: RDD[String] = > sqlContext.sparkContext.parallelize( > """{"a": 0.01}""" :: > """{"a": 0.02}""" :: Nil) > val jsonDF = sqlContext.read > .option("floatAsBigDecimal", "true") > .json(simpleFloats) > jsonDF.printSchema() > {code} > throws an exception below: > {code} > org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater > than precision (1).; > at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > ... > {code} > Since JSON data source infers {{DataType}} as {{StringType}} when it fails to > infer, it might have to be inferred as {{StringType}} or maybe just simply > {{DoubleType}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org