chenjunjiedada commented on a change in pull request #1517:
URL: https://github.com/apache/iceberg/pull/1517#discussion_r495675725
##########
File path:
flink/src/main/java/org/apache/iceberg/flink/source/RowDataIterator.java
##########
@@ -92,12 +102,12 @@
return iter;
}
- private CloseableIterable<RowData> newAvroIterable(FileScanTask task,
Map<Integer, ?> idToConstant) {
+ private CloseableIterable<RowData> newAvroIterable(FileScanTask task, Schema
schema, Map<Integer, ?> idToConstant) {
Avro.ReadBuilder builder = Avro.read(getInputFile(task))
- .reuseContainers()
- .project(projectedSchema)
+ .reuseContainers(false)
Review comment:
I mean the unit tests in `TestFlinkScan`. The `getRows` puts each `row`
from `inputformat.nexRecord(null)` into List while the `row` is reused when the
file format is Avro, so the result in List is wrong.
I didn't find a simple way to copy the `GenericRowData`, so I set
`reuseContainer` to false to align with Parquet and ORC cases. I changed it
back in 8526b6deedb747200fb21507f5716cf5e8396470 since I found the converter
could be used to copy the `row`. But there comes new concern about the double
copies for Parquet and ORC. @openinx @JingsongLi @rdblue , Should we reuse the
container for Flink read?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]