[ 
https://issues.apache.org/jira/browse/ORC-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed ORC-502.
-----------------------------
    Resolution: Duplicate

> Hive ORC read INT, BIGINT as NULL for Data created by Spark
> -----------------------------------------------------------
>
>                 Key: ORC-502
>                 URL: https://issues.apache.org/jira/browse/ORC-502
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Oleksiy Sayankin
>            Priority: Major
>         Attachments: data.orc
>
>
> *Preconditions*
> Create file {{ratings.csv}} and put it to HDFS 
> {{/user/test/rating/ratings.csv}}.
> {code}
> userId,movieId,rating,timestamp
> 1,2,4.5,1784325658
> {code}
> See appropriate {{data.orc}} file in attachment.
> *STR:*
> 1. Using spark (tested on version 2.2.1 and 2.3.1) created {{dataframe(df)}} 
> of using {{interSchema}} from a CSV file
> {code}
> val df 
> =spark.read.format("csv").option("header","true").option("inferSchema","true").load("/user/test/rating/ratings.csv")
> {code}
> 2. Now save the df into ORC format file.
> {code}
> df.write.format("orc").save("/user/test/spark_rating_orc_typesafe")
> {code}
> 3. Using hive 2.3. Try creating hive external table respective.
> {code}
> create external table rating_orc_hive_type_1(userId int,movieId int,rating 
> double, `timestamp` int) stored as ORC location 
> "/user/test/spark_orc_rating_typesafe/";
> {code}
> 4. Do query
> {code}
> select * from rating_orc_hive_type_1;
> {code}
> Only double value is printed. Null for integer and even for BIGINT.
> {code}
> OK
> NULL    NULL    4.5     1784325658
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to