[ 
https://issues.apache.org/jira/browse/HIVE-29477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-29477:
--------------------------------
    Description: 
Given the exception described in TEZ-4686:
[^hs2_stacktrace.txt]
{code:java}
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.tez.client.registry.AMRecord.getApplicationId()" because 
"this.amRecord" is null
        at 
org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
        at 
org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
        at org.apache.tez.client.TezClient.start(TezClient.java:399)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:488)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternalUnsafe(TezSessionState.java:406)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:297)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:122)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:250)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:481)
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:232)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
{code}
this is related to the assumption made in TEZ-4007: 
[https://github.com/apache/tez/blob/17546aa680e6f9a52411fe6a66c7a26de76e53a6/tez-api/src/main/java/org/apache/tez/client/registry/zookeeper/ZkFrameworkClient.java#L91]

So the point of this issue is: {*}how to acquire an application id{*}, and this 
is closely related to the standalone zookeeper mode in Tez.

What actually happens in Tez Yarn world is clearly show in the above exception 
(just replace ZkFrameworkClient with TezYarnClient):
{code}
TezSessionState.openInternal -> TezCient.start -> 
FrameworkClient.createApplication
{code}

This is true in case of Yarn, where createApplication actually goes to the 
ResourceManager, which then starts an application and returns an application id.
In case of Zookeeper-based Tez AM registry, an Application id (which is an 
artificial one, looks like an yarn application id for backward compatibility) 
should rather be discovered from a registry client, and than it's passed to the 
TezClient to make the actual Framework client aware (which is a 
ZkFrameworkClient) and able get it's status from zookeeper.

  was:
Given the exception described in TEZ-4686:
[^hs2_stacktrace.txt]
{code:java}
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.tez.client.registry.AMRecord.getApplicationId()" because 
"this.amRecord" is null
        at 
org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
        at 
org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
        at org.apache.tez.client.TezClient.start(TezClient.java:399)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:488)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternalUnsafe(TezSessionState.java:406)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:297)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:122)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:250)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:481)
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:232)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
{code}
this is related to the assumption made in TEZ-4007: 
[https://github.com/apache/tez/blob/17546aa680e6f9a52411fe6a66c7a26de76e53a6/tez-api/src/main/java/org/apache/tez/client/registry/zookeeper/ZkFrameworkClient.java#L91]

So the point of this issue is: {*}how to acquire an application id{*}, and this 
is closely related to the standalone zookeeper mode in Tez.

What actually happens in Tez Yarn world is clearly show in the above exception 
(just replace ZkFrameworkClient with TezYarnClient):\{code}
TezSessionState.openInternal -> TezCient.start -> 
FrameworkClient.createApplication
{code}

This is true in case of Yarn, where createApplication actually goes to the 
ResourceManager, which then starts an application and returns an application id.
In case of Zookeeper-based Tez AM registry, an Application id (which is an 
artificial one, looks like an yarn application id for backward compatibility) 
should rather be discovered from a registry client, and than it's passed to the 
TezClient to make the actual Framework client aware (which is a 
ZkFrameworkClient) and able get it's status from zookeeper.


> Introduce codepath for Tez external sessions discovered by Zookeeper
> --------------------------------------------------------------------
>
>                 Key: HIVE-29477
>                 URL: https://issues.apache.org/jira/browse/HIVE-29477
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> Given the exception described in TEZ-4686:
> [^hs2_stacktrace.txt]
> {code:java}
> Caused by: java.lang.NullPointerException: Cannot invoke 
> "org.apache.tez.client.registry.AMRecord.getApplicationId()" because 
> "this.amRecord" is null
>       at 
> org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
>       at 
> org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
>       at org.apache.tez.client.TezClient.start(TezClient.java:399)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:488)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternalUnsafe(TezSessionState.java:406)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:297)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:122)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:250)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:481)
>       at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:232)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> {code}
> this is related to the assumption made in TEZ-4007: 
> [https://github.com/apache/tez/blob/17546aa680e6f9a52411fe6a66c7a26de76e53a6/tez-api/src/main/java/org/apache/tez/client/registry/zookeeper/ZkFrameworkClient.java#L91]
> So the point of this issue is: {*}how to acquire an application id{*}, and 
> this is closely related to the standalone zookeeper mode in Tez.
> What actually happens in Tez Yarn world is clearly show in the above 
> exception (just replace ZkFrameworkClient with TezYarnClient):
> {code}
> TezSessionState.openInternal -> TezCient.start -> 
> FrameworkClient.createApplication
> {code}
> This is true in case of Yarn, where createApplication actually goes to the 
> ResourceManager, which then starts an application and returns an application 
> id.
> In case of Zookeeper-based Tez AM registry, an Application id (which is an 
> artificial one, looks like an yarn application id for backward compatibility) 
> should rather be discovered from a registry client, and than it's passed to 
> the TezClient to make the actual Framework client aware (which is a 
> ZkFrameworkClient) and able get it's status from zookeeper.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to