[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

WangSheng (Jira) Wed, 09 Sep 2020 05:33:12 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192834#comment-17192834
 ]


WangSheng commented on IMPALA-10159:
------------------------------------

Hi [~boroknagyz], I use spark-shell to generated test files, my spark client 
version is 2.4.5, and the orc jars in this client is 1.5.5, even I replace 
these orc jars to 1.6.3, it doesn't work. Here is the code to generated test 
files:

{code:java}
val conf = new Configuration()
val tblLoc = "/test-warehouse/iceberg_test/iceberg_partitioned_orc"
val catalog = new HadoopTables(conf);
val sparkSchema = StructType(List(StructField("id", IntegerType,true),
StructField("user", StringType,false),StructField("action", StringType,false),
StructField("event_time", 
SparkSchemaUtil.convert(Types.TimestampType.withoutZone()),false)))
val icebergSchema = SparkSchemaUtil.convert(sparkSchema)
val spec = 
PartitionSpec.builderFor(icebergSchema).hour("event_time").identity("action").build
val table = catalog.create(icebergSchema, spec, tblLoc)
val data_df = 
spark.createDataFrame(Seq((1,"Alex","view",Timestamp.valueOf("2020-01-01 
08:00:00")))).toDF("id","user","action","ts")
var array = 
data_df.select(data_df("id"),data_df("user"),data_df("action"),to_timestamp(data_df("ts"))).collect()
val df = spark.createDataFrame(sc.makeRDD(array), sparkSchema)
df.write.format("iceberg").option("write-format", 
"orc").mode("append").save(tblLoc)
spark.read.format("iceberg").load(tblLoc).show
{code}
This code will throw exception "java.lang.UnsupportedOperationException: Spark 
does not support timestamp without time zone fields"
If we replace "SparkSchemaUtil.convert(Types.TimestampType.withoutZone())" to 
"TimestampType", we can generated test files normally, but when query in 
Impala, you can meet the problem in IMPALA-9967.
And here is the create statement:

{code:java}
CREATE EXTERNAL TABLE default.iceberg_partitioned_orc
STORED AS ICEBERG
LOCATION 
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned_orc'
TBLPROPERTIES('iceberg_file_format'='orc');
{code}



> Support ORC file format for Iceberg table
> -----------------------------------------
>
>                 Key: IMPALA-10159
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10159
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: WangSheng
>            Assignee: WangSheng
>            Priority: Minor
>              Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

Reply via email to