[ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=478631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478631
 ]

ASF GitHub Bot logged work on HIVE-23408:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Sep/20 15:00
            Start Date: 03/Sep/20 15:00
    Worklog Time Spent: 10m 
      Work Description: ashutoshc commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483044949



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##########
@@ -265,11 +279,70 @@ public URI apply(Path path) {
         }
         dag.addURIsForCredentials(uris);
       }
+      getKafkaCredentials((MapWork)work, dag, conf);
     }
-
     getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+    Token<?> tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+    if (tokenCheck != null) {
+      LOG.debug("Kafka credentials already added, skipping...");
+      return;
+    }
+    LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+    String kafkaBrokers = null;
+    Map<String, PartitionDesc> partitions = work.getAliasToPartnInfo();

Review comment:
       This is iterating over all partition objects in plan even when kafka is 
not used. This gets expensive when there are large number of partition objects. 
Is it possible to do a quick check to see if kafka is used before iterating 
over full list of parttions?

##########
File path: pom.xml
##########
@@ -169,6 +169,7 @@
     <junit.version>4.13</junit.version>
     <junit.jupiter.version>5.6.2</junit.jupiter.version>
     <junit.vintage.version>5.6.2</junit.vintage.version>
+    <kafka.version>2.5.0</kafka.version>

Review comment:
       Kafka version is also declared specifically in kafka-handler/pom.xml Can 
we parmeterize there, so that all of Hive is referencing just one kafka version?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 478631)
    Time Spent: 0.5h  (was: 20m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -----------------------------------------------------------------
>
>                 Key: HIVE-23408
>                 URL: https://issues.apache.org/jira/browse/HIVE-23408
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.0
>            Reporter: Rajkumar Singh
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to