[
https://issues.apache.org/jira/browse/HIVE-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HIVE-19580:
----------------------------------
Environment:
EMR s3:// connector
Spark 2.3 but also true for lower versions
Hive 2.3.2
was:
AWS S3 to store files
Spark 2.3 but also true for lower versions
Hive 2.3.2
> Hive 2.3.2 with ORC files stored on S3 are case sensitive
> ---------------------------------------------------------
>
> Key: HIVE-19580
> URL: https://issues.apache.org/jira/browse/HIVE-19580
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.3.2
> Environment: EMR s3:// connector
> Spark 2.3 but also true for lower versions
> Hive 2.3.2
> Reporter: Arthur Baudry
> Priority: Major
> Fix For: 2.3.2
>
>
> Original file is csv:
> COL1,COL2
> 1,2
> ORC file are created with Spark 2.3:
> scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")
> scala> df.printSchema
> root
> |– COL1: string (nullable = true)|
> |– COL2: string (nullable = true)|
> scala> df.write.orc("s3://bucket/prefix")
> In Hive:
> hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC
> LOCATION ("s3://bucket/prefix");
> hive> SELECT * FROM test_orc;
> OK
> NULL NULL
> *Everyfield is null. However if fields are generated using lower case in
> Spark schemas then everything works.*
> The reason why I'm raising this bug is that we have customers using Hive
> 2.3.2 to read files we generate through Spark and all our code base is
> addressing fields using upper case while this is incompatible with their Hive
> instance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)