[ 
https://issues.apache.org/jira/browse/HIVE-26734?focusedWorklogId=826017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-826017
 ]

ASF GitHub Bot logged work on HIVE-26734:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Nov/22 07:28
            Start Date: 15/Nov/22 07:28
    Worklog Time Spent: 10m 
      Work Description: szlta commented on code in PR #3758:
URL: https://github.com/apache/hive/pull/3758#discussion_r1022426527


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergDeleteWriter.java:
##########
@@ -41,21 +41,28 @@
 class HiveIcebergDeleteWriter extends HiveIcebergWriterBase {
 
   private final GenericRecord rowDataTemplate;
+  private final boolean skipRowData;
 
   HiveIcebergDeleteWriter(Schema schema, Map<Integer, PartitionSpec> specs,
       FileWriterFactory<Record> writerFactory, OutputFileFactory fileFactory, 
FileIO io,
-      long targetFileSize) {
+      long targetFileSize, boolean skipRowData) {
     super(schema, specs, io,
         new ClusteredPositionDeleteWriter<>(writerFactory, fileFactory, io, 
targetFileSize));
     rowDataTemplate = GenericRecord.create(schema);
+    this.skipRowData = skipRowData;
   }
 
   @Override
   public void write(Writable row) throws IOException {
     Record rec = ((Container<Record>) row).get();
     PositionDelete<Record> positionDelete = 
IcebergAcidUtil.getPositionDelete(rec, rowDataTemplate);
     int specId = IcebergAcidUtil.parseSpecId(rec);
-    writer.write(positionDelete, specs.get(specId), 
partition(positionDelete.row(), specId));
+    Record rowData = positionDelete.row();
+    if (skipRowData) {
+      // Set null as the row data as we intend to avoid writing the actual row 
data in the delete file.
+      positionDelete.set(positionDelete.path(), positionDelete.pos(), null);

Review Comment:
   By doing this we actually create a positional delete record with the row 
content via IcebergAcidUtil.getPositionalDelete then we unset it here.
   Why not propagate skipRowData as argument to aforementioned method in the 
first place? The responsibility of putting the record together is already 
there. 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 826017)
    Time Spent: 2h  (was: 1h 50m)

> Iceberg: Add an option to allow positional delete files without actual row 
> data
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-26734
>                 URL: https://issues.apache.org/jira/browse/HIVE-26734
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Allow an option to have actual row data in the Iceberg PositionalDelete file 
> as optional, to avoid reading and writing huge amount of actual row data 
> during query executions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to