[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729945#comment-16729945
 ] 

Dongjoon Hyun commented on SPARK-26437:
---------------------------------------

Hi, [~zengxl].

Thank you for reporting. This is a very old issue since Apache Spark 1.x which 
occurs when you use `decimal`. Please note that `CAST` and `decimal` in the 
following example. Since Spark 2.0, `0.0` literal interpreted as `Decimal`. So, 
you are hitting this issue without casting, too. This is fixed at `master` 
branch and will be released as Apache Spark 3.0.0.

{code}
scala> sc.version
res0: String = 1.6.3

scala> sql("drop table spark_orc")
scala> sql("create table spark_orc stored as orc as select cast(0.00 as 
decimal(2,2)) as a")
scala> sql("select * from spark_orc").show
...
Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
limit: 0
{code}

If you are interested, the followings are the details.

First, the underlying ORC issue (HIVE-13083) is fixed at Hive 1.3.0, but Spark 
is still using embedded Hive 1.2.1. To avoid the underlying ORC issue, you can 
use new ORC data source (`set spark.sql.orc.impl=native`). So, in Spark 2.4.0, 
you can use `USING` syntax to avoid this.

{code}
scala> sql("create table spark_orc using orc as select 0.00 as a")
scala> sql("select * from spark_orc").show
+----+
|   a|
+----+
|0.00|
+----+

scala> spark.version
res2: String = 2.4.0
{code}

Second, SPARK-22977 made a regression on CTAS at Spark 2.3.0 and is fixed 
recently SPARK-25271 (Hive CTAS commands should use data source if it is 
convertible) at Apache Spark 3.0.0. In Spark 3.0.0, you can use `STORED AS ORC` 
syntax without this problem.
{code}
scala> sql("create table spark_orc stored as orc as select 0.00 as a")
scala> sql("select * from spark_orc").show
+----+
|   a|
+----+
|0.00|
+----+

scala> spark.version
res3: String = 3.0.0-SNAPSHOT
{code}

So, I'll close this issue since this is fixed in 3.0.0.

cc [~cloud_fan], [~viirya], [~smilegator], [~hyukjin.kwon]

> Decimal data becomes bigint to query, unable to query
> -----------------------------------------------------
>
>                 Key: SPARK-26437
>                 URL: https://issues.apache.org/jira/browse/SPARK-26437
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1
>            Reporter: zengxl
>            Priority: Major
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>         *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>         *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>         *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>         *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to