lrz created HUDI-1779:
-------------------------
Summary: Fail to bootstrap/upsert a table which contains timestamp
column
Key: HUDI-1779
URL: https://issues.apache.org/jira/browse/HUDI-1779
Project: Apache Hudi
Issue Type: Bug
Reporter: lrz
Fix For: 0.9.0
current when hudi bootstrap a parquet file, or upsert into a parquet file which
contains timestmap column, it will fail because these issues:
1) At bootstrap operation, if the origin parquet file was written by a spark
application, then spark will default save timestamp as int96(see
spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because of
Hudi can not read Int96 type now.(this issue can be solve by upgrade parquet to
1.12.0, and set parquet.avro.readInt96AsFixed=true, please check
[https://github|https://github/]
<[https://github/]>.com/apache/parquet-mr/pull/831/files)
2) after bootstrap, doing upsert will fail because we use hoodie schema to read
origin parquet file. The schema is not match because hoodie schema treat
timestamp as long and at origin file it’s Int96
3) after bootstrap, and partial update for a parquet file will fail, because we
copy the old record and save by hoodie schema( we miss a convertFixedToLong
operation like spark does)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)