Re: [PR] HIVE-28258: Use Iceberg semantics for Merge task [hive]

via GitHub Tue, 28 May 2024 04:07:58 -0700


SourabhBadhya commented on code in PR #5251:
URL: https://github.com/apache/hive/pull/5251#discussion_r1617033067



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -312,33 +314,43 @@ private static final class IcebergRecordReader<T> extends 
RecordReader<Void, T>
     private CloseableIterator<T> currentIterator;
     private Table table;
     private boolean fetchVirtualColumns;
+    private boolean isMerge = false;
+    private IcebergMergeSplit mergeSplit;

Review Comment:
   Created a new record reader impl.



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -312,33 +314,43 @@ private static final class IcebergRecordReader<T> extends 
RecordReader<Void, T>
     private CloseableIterator<T> currentIterator;
     private Table table;
     private boolean fetchVirtualColumns;
+    private boolean isMerge = false;
+    private IcebergMergeSplit mergeSplit;
 
     @Override
     public void initialize(InputSplit split, TaskAttemptContext newContext) {
       // For now IcebergInputFormat does its own split planning and does not 
accept FileSplit instances
-      CombinedScanTask task = ((IcebergSplit) split).task();
       this.context = newContext;
       this.conf = newContext.getConfiguration();
-      this.table = SerializationUtil.deserializeFromBase64(
-                conf.get(InputFormatConfig.SERIALIZED_TABLE_PREFIX + 
conf.get(InputFormatConfig.TABLE_IDENTIFIER)));
+      this.table = HiveIcebergStorageHandler.table(conf, 
conf.get(InputFormatConfig.TABLE_IDENTIFIER));
       HiveIcebergStorageHandler.checkAndSetIoConfig(conf, table);
-      this.tasks = task.files().iterator();
       this.nameMapping = 
table.properties().get(TableProperties.DEFAULT_NAME_MAPPING);
       this.caseSensitive = conf.getBoolean(InputFormatConfig.CASE_SENSITIVE, 
InputFormatConfig.CASE_SENSITIVE_DEFAULT);
       this.expectedSchema = readSchema(conf, table, caseSensitive);
       this.reuseContainers = 
conf.getBoolean(InputFormatConfig.REUSE_CONTAINERS, false);
       this.inMemoryDataModel = 
conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL,
               InputFormatConfig.InMemoryDataModel.GENERIC);
       this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf);
+      if (split instanceof IcebergMergeSplit) {
+        this.isMerge = true;
+        this.mergeSplit = (IcebergMergeSplit) split;
+      } else {
+        CombinedScanTask task = ((IcebergSplit) split).task();
+        this.tasks = task.files().iterator();
+      }

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28258: Use Iceberg semantics for Merge task [hive]

Reply via email to