luoyuxia opened a new issue, #8134:
URL: https://github.com/apache/paimon/issues/8134

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   ### Paimon version
   
   master branch (also affects released versions)
   
   ### Compute Engine
   
   JavaAPI (Arrow bundle writing via `ArrowBundleRecords`)
   
   ### Minimal reproduce step
   
   Write Arrow record batches containing `LocalZonedTimestampType`, `TimeType` 
(non-milli precision), or `BinaryType` columns to Paimon via 
`ArrowBundleRecords`. The `Arrow2PaimonVectorConverter` is used to convert 
Arrow vectors to Paimon column vectors during this process.
   
   **Issue 1: `visit(LocalZonedTimestampType)`**
   
   ```java
   long value = (long) vector.getObject(i);
   ```
   
   For timezone-unaware `TimeStampVector` (e.g., `TimeStampMilliVector`), 
`getObject(i)` returns `LocalDateTime`, not `Long`.
   
   **Issue 2: `visit(TimeType)`**
   
   ```java
   return ((TimeMilliVector) vector).get(index);
   ```
   
   Hardcodes `TimeMilliVector`, but the Arrow vector may be `TimeMicroVector`, 
`TimeNanoVector`, or `TimeSecVector`.
   
   **Issue 3: `visit(BinaryType)`**
   
   ```java
   byte[] bytes = ((VarBinaryVector) vector).getObject(index);
   ```
   
   `BinaryType` (fixed-length) corresponds to Arrow's `FixedSizeBinaryVector`, 
not `VarBinaryVector`.
   
   ### What doesn't meet your expectations?
   
   All three issues cause runtime exceptions:
   
   1. `ClassCastException: class java.time.LocalDateTime cannot be cast to 
class java.lang.Long` for `LocalZonedTimestampType`
   2. `ClassCastException` when the Arrow time vector is not `TimeMilliVector`
   3. `ClassCastException` when reading a `BinaryType` column backed by 
`FixedSizeBinaryVector`
   
   **Expected:** `Arrow2PaimonVectorConverter` should correctly handle all 
Arrow vector subtypes for these Paimon data types.
   
   ### Anything else?
   
   Suggested fixes:
   
   1. For `LocalZonedTimestampType`: read the raw long from the data buffer 
directly:
      ```java
      long value = vector.getDataBuffer().getLong((long) i * 
TimeStampVector.TYPE_WIDTH);
      ```
   
   2. For `TimeType`: use `instanceof` to handle `TimeMilliVector`, 
`TimeMicroVector`, `TimeNanoVector`, and `TimeSecVector`, converting to 
milliseconds accordingly.
   
   3. For `BinaryType`: use `FixedSizeBinaryVector` instead of 
`VarBinaryVector`.
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to