Temporal Iceberg Service

Taher Koitawala Tue, 23 Aug 2022 05:17:07 -0700

Hi All,
        I am creating an iceberg writer over temporal service that converts
CDC parquet files to Iceberg format. That means that the file will have a
record and corresponding timestamp flags like `inserted_at`, `deleted_at`
and `updated_at`, each of which will have a value defining the action.


Initially, when there is no table in the iceberg catalog, the plan is to
use the Parquet footer schema and map that directly to the Iceberg schema
using *org.apache.iceberg.parquet.ParquetSchemaUtil.convert(MessageType
parquetSchema).* However, the issue that I am facing is that I am also
having to convert Parquet datatypes to Iceberg datatypes, specifically the
timestamp types when inserting into the table.

When using the Parquet reader with the simple group, I see the timestamp as
long and when inserted to iceberg, it expects it to be
*java.time.OffsetDateTime*, specific error I get is `Long cannot be cast to
OffsetDateTime`

I have 2 questions on this use case:
1. Is there an easy way to insert parquet to iceberg records directly
without me having to do a type conversion since the goal is to make it all
happen within temporal?
2. Need suggestions to handle updates. As for updates I'm having to commit
inserts and then commit deletes and then create a new writer again
to proceed.

Regards,
Taher Koitawala

Temporal Iceberg Service

Reply via email to