callmepandey opened a new issue, #502: URL: https://github.com/apache/iceberg-cpp/issues/502
## Summary The `ProjectRecordBatch` function in `parquet_data_util.cc` only supports `::arrow::ListArray` (32-bit offsets) but not `::arrow::LargeListArray` (64-bit offsets). This limitation is marked with a FIXME comment at line 151. ## Problem Arrow's `LargeListArray` uses 64-bit offsets instead of 32-bit, allowing it to handle lists with more than 2^31-1 total child elements. Currently, attempting to project a `LargeListArray` would fail with an error like: ``` Expected list type, got: large_list<...> ``` ## Proposed Solution 1. **Add templated `ProjectListArrayImpl<>` function** - Generic implementation that works with both `ListArray` and `LargeListArray` 2. **Add `ProjectLargeListArray` wrapper** - Calls the template with `LargeListArray` and `LargeListType` 3. **Update `ProjectNestedArray`** - Handle both `::arrow::Type::LIST` and `::arrow::Type::LARGE_LIST` in the `TypeId::kList` case 4. **Add test case** - Verify `LargeListArray` projection works correctly ## Files to Change - `src/iceberg/parquet/parquet_data_util.cc` - `src/iceberg/test/parquet_data_test.cc` ## References - FIXME comment: `src/iceberg/parquet/parquet_data_util.cc:151` - Arrow LargeListArray docs: https://arrow.apache.org/docs/cpp/api/array.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
