[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

ASF GitHub Bot (Jira) Thu, 03 Jun 2021 08:53:12 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606019
 ]


ASF GitHub Bot logged work on HIVE-25195:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/21 15:52
            Start Date: 03/Jun/21 15:52
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644915386



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##########
@@ -384,24 +382,28 @@ private void collectCommitInformation(TezWork work) 
throws IOException, TezExcep
               // get all target tables this vertex wrote to
               List<String> tables = new ArrayList<>();
               for (Map.Entry<String, String> entry : jobConf) {
-                if (entry.getKey().startsWith("iceberg.mr.serialized.table.")) 
{
-                  
tables.add(entry.getKey().substring("iceberg.mr.serialized.table.".length()));
+                if 
(entry.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+                  
tables.add(entry.getKey().substring(ICEBERG_SERIALIZED_TABLE_PREFIX.length()));
                 }
               }
-              // save information for each target table (jobID, task num, 
query state)
+              // find iceberg props in jobConf as they can be needed, but not 
available, during job commit
+              Map<String, String> icebergProperties = new HashMap<>();
+              jobConf.forEach(e -> {
+                // don't copy the serialized tables, they're not needed 
anymore and take up lots of space
+                if (e.getKey().startsWith("iceberg.mr.") && 
!e.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+                  icebergProperties.put(e.getKey(), e.getValue());
+                }
+              });
+              // save information for each target table (jobID, task num)
               for (String table : tables) {
-                sessionConf.set(HIVE_TEZ_COMMIT_JOB_ID_PREFIX + table, 
jobIdStr);
-                sessionConf.setInt(HIVE_TEZ_COMMIT_TASK_COUNT_PREFIX + table,
-                    status.getProgress().getSucceededTaskCount());
+                SessionStateUtil.newCommitInfo(jobConf, table)

Review comment:
       This is a little bit odd to me.
   I mean I understand what did you do, but it still feels strange. Convince me 
😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 606019)
    Time Spent: 1.5h  (was: 1h 20m)

> Store Iceberg write commit and ctas information in QueryState 
> --------------------------------------------------------------
>
>                 Key: HIVE-25195
>                 URL: https://issues.apache.org/jira/browse/HIVE-25195
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

Reply via email to