huaxingao opened a new pull request, #1920:
URL: https://github.com/apache/datafusion-comet/pull/1920
## Which issue does this PR close?
Closes #.
## Rationale for this change
Iceberg shades Parquet. We can't pass Parquet objects from Iceberg to Comet.
In order to get around this problem, this PR encapsulates the Parquet objects.
Here is the summary of the changes:
Iceberg call these APIs:
```
public static ColumnReader getColumnReader(
DataType type,
ColumnDescriptor descriptor,
CometSchemaImporter importer,
int batchSize,
boolean useDecimal128,
boolean useLazyMaterialization)
ColumnReader.setPageReader(PageReader pageReader)
```
In order to encapsulate `ColumnDescriptor` and `PageReader`, will add a
`ParquetColumnSpec`, change the above two APIs to
```
public static ColumnReader getColumnReader(
DataType type,
ParquetColumnSpec columnSpec,
CometSchemaImporter importer,
int batchSize,
boolean useDecimal128,
boolean useLazyMaterialization)
// construct a ColumnDescriptor from ParquetColumnSpec
setRowGroupReader(org.apache.comet.parquet.RowGroupReader rowGroupReader,
ParquetColumnSpec columnSpec)
// Will call PageReader pageReader =
RowGroupReader.getPageReader(ColumnDescriptor)
// setPageReader(pageReader);
```
In order to call `setRowGroupReader(org.apache.comet.parquet.RowGroupReader
rowGroupReader, ParquetColumnSpec columnSpec)`, in Iceberg side, will need to
use Comet's `FileReader` instead of `ParquetFileReader`, so we will call
`FileReader.readNextRowGroup()` to get a
`org.apache.comet.parquet.RowGroupReader` instead Parquet's `PageReadStore`.
`ParquetReadOption` can't be passed directly either, so the related info are
passed and `ParquetReadOption` is built on Comet.
## What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it is
sometimes worth providing a summary of the individual changes in this PR.
-->
## How are these changes tested?
I did integration test on my local.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]