Benjamin0313 opened a new issue, #16663:
URL: https://github.com/apache/iceberg/issues/16663
### Feature Request / Improvement
## Summary
Spark 4.1 introduced a native `TimeType` (SPARK-51162). Iceberg's `time`
type can now be mapped to it, but the Spark module still rejects it: reading or
writing a `time` column throws `UnsupportedOperationException: Spark does not
support time fields` from `org.apache.iceberg.spark.TypeToSparkType`.
This revisits #9006, which was closed in 2019 — at the time Spark had no
time type, so the suggested workaround was a timestamp pinned to 1970-01-01.
That constraint no longer applies in Spark 4.1.
## Motivation / use case
Time-of-day values (e.g. MySQL `TIME` columns ingested via CDC) currently
cannot land in Iceberg `time` columns when the query engine is Spark — they
have to be cast to string as a workaround. With Spark 4.1's `TimeType`, a
proper round-trip is now possible.
## Proposed scope
Add `time` support to the **Spark 4.1** module (`spark/v4.1`) only —
`TimeType` does not exist in Spark 3.5 / 4.0.
- Type mapping: Iceberg `time` ⇄ Spark `TimeType` (microsecond precision).
- Value conversion: Iceberg stores time as **microseconds**-from-midnight,
Spark 4.1 stores **nanoseconds** (SPARK-52460), so values are converted (×1000
on read, ÷1000 on write).
- Row-based reads/writes for Parquet, ORC, and Avro.
## Open question — vectorized reads
I'd like feedback on deferring vectorized reads. Spark 4.1's `ColumnarBatch`
cannot expose `TimeType` values yet (`ColumnarBatchRow#get` throws `Datatype
not supported TimeType(6)`), and the Arrow-based accessor lives in the shared
`arrow` module (changing it would affect Flink and other engines). My current
approach falls back to row-based reads when a `time` column is projected,
leaving vectorized support as a follow-up. Does this seem like the right
initial scope?
I have a working implementation (type mapping + value conversion + tests
across Parquet/ORC/Avro) and will open a PR once there's agreement on the
approach.
### Query engine
Spark
### Willingness to contribute
- [x] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]