chenjunjiedada edited a comment on issue #359: Spec: Add file and position 
delete files
URL: 
https://github.com/apache/incubator-iceberg/issues/359#issuecomment-612837111
 
 
   Hi @rdblue @jerryshao, I'm working on the reader and writer for delete 
files. As for the reader, the input and output are clear as the delete files 
would be extracted from `FileScanTask` and the output would be iterable of 
`InternalRow` or `GenericRecord`. While for the writer, I 'm not very sure 
since how metadata-level delete will be implemented. For the spark engine, I 
could imagine two possible ways:
   
   - spark build the table scan with the filter expression and projection of 
the partition columns and metadata columns then save the data frame in append 
mode with an option like `data=diff`. In this case, Iceberg can generate 
partition level delete files from the input partition. That means we may not 
need a separate writer, maybe some update in the current table writer.
   
   - spark has a logical plan `RowDeleteScan` like the table scan, each task 
read the input file with filter(residual) and then write to delete file.  In 
this case, we need a separate writer that takes the input file and filter 
expression as input and writes the filtered records. This would need residual 
evaluation for every supported format, but we can use `IcebergGenerics` to 
handle this.
   
   Both of them depend on spark side implementation, any thought?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to