Benjamin0313 opened a new issue, #16663:
URL: https://github.com/apache/iceberg/issues/16663

   ### Feature Request / Improvement
   
   ## Summary
   
   Spark 4.1 introduced a native `TimeType` (SPARK-51162). Iceberg's `time` 
type can now be mapped to it, but the Spark module still rejects it: reading or 
writing a `time` column throws `UnsupportedOperationException: Spark does not 
support time fields` from `org.apache.iceberg.spark.TypeToSparkType`.
   
   This revisits #9006, which was closed in 2019 — at the time Spark had no 
time type, so the suggested workaround was a timestamp pinned to 1970-01-01. 
That constraint no longer applies in Spark 4.1.
   
   ## Motivation / use case
   
   Time-of-day values (e.g. MySQL `TIME` columns ingested via CDC) currently 
cannot land in Iceberg `time` columns when the query engine is Spark — they 
have to be cast to string as a workaround. With Spark 4.1's `TimeType`, a 
proper round-trip is now possible.
   
   ## Proposed scope
   Add `time` support to the **Spark 4.1** module (`spark/v4.1`) only — 
`TimeType` does not exist in Spark 3.5 / 4.0.
   
   - Type mapping: Iceberg `time` ⇄ Spark `TimeType` (microsecond precision).
   - Value conversion: Iceberg stores time as **microseconds**-from-midnight, 
Spark 4.1 stores  **nanoseconds** (SPARK-52460), so values are converted (×1000 
on read, ÷1000 on write).
   - Row-based reads/writes for Parquet, ORC, and Avro.
   
   ## Open question — vectorized reads
     
   I'd like feedback on deferring vectorized reads. Spark 4.1's `ColumnarBatch` 
cannot expose `TimeType` values yet (`ColumnarBatchRow#get` throws `Datatype 
not supported TimeType(6)`), and the Arrow-based accessor lives in the shared 
`arrow` module (changing it would affect Flink and other engines). My current 
approach falls back to row-based reads when a `time` column is projected, 
leaving vectorized support as a follow-up. Does this seem like the right 
initial scope?
   
   I have a working implementation (type mapping + value conversion + tests 
across Parquet/ORC/Avro) and will open a PR once there's agreement on the 
approach.
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to