chenjunjiedada edited a comment on issue #359: Spec: Add file and position delete files URL: https://github.com/apache/incubator-iceberg/issues/359#issuecomment-612837111 Hi @rdblue @jerryshao, I'm working on the reader and writer for delete files. As for the reader, the input and output are clear as the delete files would be extracted from `FileScanTask` and the output would be iterable of `InternalRow` or `GenericRecord`. While for the writer, I 'm not very sure since how metadata-level delete will be implemented. For the spark engine, I could imagine two possible ways: - spark build the table scan with the filter expression and projection of the partition columns and metadata columns then save the data frame in append mode with an option like `data=diff`. In this case, Iceberg can generate partition level delete files from the input partition. That means we may not need a separate writer, maybe some update in the current table writer. - spark has a logical plan `RowDeleteScan` like the table scan, each task read the input file with filter(residual) and then write to delete file. In this case, we need a separate writer that takes the input file and filter expression as input and writes the filtered records. This would need residual evaluation for every supported format, but we can use `IcebergGenerics` to handle this. Both of them depend on spark side implementation, any thought?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
