[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1517: Flink: apply row-level delete when reading

GitBox Sun, 27 Sep 2020 20:50:42 -0700


chenjunjiedada commented on a change in pull request #1517:
URL: https://github.com/apache/iceberg/pull/1517#discussion_r495675725




##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/source/RowDataIterator.java
##########
@@ -92,12 +102,12 @@
     return iter;
   }
 
-  private CloseableIterable<RowData> newAvroIterable(FileScanTask task, 
Map<Integer, ?> idToConstant) {
+  private CloseableIterable<RowData> newAvroIterable(FileScanTask task, Schema 
schema, Map<Integer, ?> idToConstant) {
     Avro.ReadBuilder builder = Avro.read(getInputFile(task))
-        .reuseContainers()
-        .project(projectedSchema)
+        .reuseContainers(false)

Review comment:
       I mean the unit tests in `TestFlinkScan`.  The `getRows` puts each `row` 
from `inputformat.nexRecord(null)` into List while the `row` is reused when the 
file format is Avro, so the result in List is wrong.
   
   I didn't find a simple way to copy the `GenericRowData`, so I set 
`reuseContainer` to false to align with Parquet and ORC cases.  I changed it 
back in 8526b6deedb747200fb21507f5716cf5e8396470 since I found the converter 
could be used to copy the `row`. But there comes new concern about the double 
copies for Parquet and ORC. @openinx @JingsongLi @rdblue ,  Should we reuse the 
container for Flink read? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1517: Flink: apply row-level delete when reading

Reply via email to