[
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192834#comment-17192834
]
WangSheng commented on IMPALA-10159:
------------------------------------
Hi [~boroknagyz], I use spark-shell to generated test files, my spark client
version is 2.4.5, and the orc jars in this client is 1.5.5, even I replace
these orc jars to 1.6.3, it doesn't work. Here is the code to generated test
files:
{code:java}
val conf = new Configuration()
val tblLoc = "/test-warehouse/iceberg_test/iceberg_partitioned_orc"
val catalog = new HadoopTables(conf);
val sparkSchema = StructType(List(StructField("id", IntegerType,true),
StructField("user", StringType,false),StructField("action", StringType,false),
StructField("event_time",
SparkSchemaUtil.convert(Types.TimestampType.withoutZone()),false)))
val icebergSchema = SparkSchemaUtil.convert(sparkSchema)
val spec =
PartitionSpec.builderFor(icebergSchema).hour("event_time").identity("action").build
val table = catalog.create(icebergSchema, spec, tblLoc)
val data_df =
spark.createDataFrame(Seq((1,"Alex","view",Timestamp.valueOf("2020-01-01
08:00:00")))).toDF("id","user","action","ts")
var array =
data_df.select(data_df("id"),data_df("user"),data_df("action"),to_timestamp(data_df("ts"))).collect()
val df = spark.createDataFrame(sc.makeRDD(array), sparkSchema)
df.write.format("iceberg").option("write-format",
"orc").mode("append").save(tblLoc)
spark.read.format("iceberg").load(tblLoc).show
{code}
This code will throw exception "java.lang.UnsupportedOperationException: Spark
does not support timestamp without time zone fields"
If we replace "SparkSchemaUtil.convert(Types.TimestampType.withoutZone())" to
"TimestampType", we can generated test files normally, but when query in
Impala, you can meet the problem in IMPALA-9967.
And here is the create statement:
{code:java}
CREATE EXTERNAL TABLE default.iceberg_partitioned_orc
STORED AS ICEBERG
LOCATION
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned_orc'
TBLPROPERTIES('iceberg_file_format'='orc');
{code}
> Support ORC file format for Iceberg table
> -----------------------------------------
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: WangSheng
> Assignee: WangSheng
> Priority: Minor
> Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have
> already do some work in IMPALA-9741, we can continue ORC file format
> supported work in this jira.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]