[ https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729945#comment-16729945 ]
Dongjoon Hyun commented on SPARK-26437: --------------------------------------- Hi, [~zengxl]. Thank you for reporting. This is a very old issue since Apache Spark 1.x which occurs when you use `decimal`. Please note that `CAST` and `decimal` in the following example. Since Spark 2.0, `0.0` literal interpreted as `Decimal`. So, you are hitting this issue without casting, too. This is fixed at `master` branch and will be released as Apache Spark 3.0.0. {code} scala> sc.version res0: String = 1.6.3 scala> sql("drop table spark_orc") scala> sql("create table spark_orc stored as orc as select cast(0.00 as decimal(2,2)) as a") scala> sql("select * from spark_orc").show ... Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0 {code} If you are interested, the followings are the details. First, the underlying ORC issue (HIVE-13083) is fixed at Hive 1.3.0, but Spark is still using embedded Hive 1.2.1. To avoid the underlying ORC issue, you can use new ORC data source (`set spark.sql.orc.impl=native`). So, in Spark 2.4.0, you can use `USING` syntax to avoid this. {code} scala> sql("create table spark_orc using orc as select 0.00 as a") scala> sql("select * from spark_orc").show +----+ | a| +----+ |0.00| +----+ scala> spark.version res2: String = 2.4.0 {code} Second, SPARK-22977 made a regression on CTAS at Spark 2.3.0 and is fixed recently SPARK-25271 (Hive CTAS commands should use data source if it is convertible) at Apache Spark 3.0.0. In Spark 3.0.0, you can use `STORED AS ORC` syntax without this problem. {code} scala> sql("create table spark_orc stored as orc as select 0.00 as a") scala> sql("select * from spark_orc").show +----+ | a| +----+ |0.00| +----+ scala> spark.version res3: String = 3.0.0-SNAPSHOT {code} So, I'll close this issue since this is fixed in 3.0.0. cc [~cloud_fan], [~viirya], [~smilegator], [~hyukjin.kwon] > Decimal data becomes bigint to query, unable to query > ----------------------------------------------------- > > Key: SPARK-26437 > URL: https://issues.apache.org/jira/browse/SPARK-26437 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1 > Reporter: zengxl > Priority: Major > > this is my sql: > create table tmp.tmp_test_6387_1224_spark stored as ORCFile as select 0.00 > as a > select a from tmp.tmp_test_6387_1224_spark > CREATE TABLE `tmp.tmp_test_6387_1224_spark`( > {color:#f79232} `a` decimal(2,2)){color} > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > When I query this table(use hive or sparksql,the exception is same), I throw > the following exception information > *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed > stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 > limit: 0* > *at > org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)* > *at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)* > *at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)* > *at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org