-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51312/#review147362
-----------------------------------------------------------



Possible to add a test (or a few)? (I don' see one in the patch attached to the 
jira).

Question: How long does it take for a persistent node to go away - in case of a 
JVM crash / node crash. Is it possible that a new process starts up within the 
duration it takes for the old node-slot entry to be removed? (It gets added to 
the end as a result)

In terms of a permanent cluster size reduction - I think we should at least 
take some steps to handle this scenario, or an equivalent scenario where it 
takes a long time (minutes) for a node to come back. We have a force locality 
scheduling mode, this will effectively cause new queries to start in a state 
where they cannot complete. Check for this and fail the query?, or fallback to 
a next node for the split. The fallback to the next node is something that is 
going to come in soon anyway, to control locality if nodes are not available 
(current is always random as against 'random' being a configurable option :) )


llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 104)
<https://reviews.apache.org/r/51312/#comment214497>

    Move this into a separate path altogether, instead of listing and filtering 
at the same path each time?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 110)
<https://reviews.apache.org/r/51312/#comment214498>

    woekersPath is no long defined, so the comment can be deleted.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 533)
<https://reviews.apache.org/r/51312/#comment214487>

    There's a lot of similar code to read records within a loop, and a method 
to read records. Move into a single method? Can be done in a separate jiras as 
well.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 547)
<https://reviews.apache.org/r/51312/#comment214501>

    The size of this collection is used to determine the #knownWorkers in 
HostAffinitySplitLocationProvider. The size at the moment represents active 
nodes.
    
    I don't think the intent is to return a fewer number of nodes than what 
existed if a node goes down? 
    1) Return n entries, and one entry would indicate it's not active (and this 
will need to be acted upon by all clients
    2) Add a getNumInstances itnerface, which can differ from actual nodes 
available, along with a getNodeAtLocation interface?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 554)
<https://reviews.apache.org/r/51312/#comment214496>

    Here, as well as other places, a node is only available when it's slot has 
been registered. Otherwise it should not be visible to clients.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 (line 577)
<https://reviews.apache.org/r/51312/#comment214495>

    Don't send back a node until it's slot registration has completed. We don't 
know what position it will take. This logic puts such nodes at the end.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java 
(line 51)
<https://reviews.apache.org/r/51312/#comment214489>

    Are there tests in curator for the said node, which we could borrow parts 
from?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java 
(line 128)
<https://reviews.apache.org/r/51312/#comment214472>

    Nit: else {
      slots.add( ..)
    }
    
    Makes it a little more readable, and maintanable for future changes.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java 
(line 138)
<https://reviews.apache.org/r/51312/#comment214477>

    Potential divide by 0? (new cluster)
    
    What's the purpose of the 2.0f/approxWorkerCount ?



ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java
 (line 94)
<https://reviews.apache.org/r/51312/#comment214490>

    This isn't part of the patch uploaded to jira?


- Siddharth Seth


On Aug. 23, 2016, 1:41 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51312/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 1:41 a.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java
>  99ead9b 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java
>  e9456f2 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
>  64d2617 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java 
> PRE-CREATION 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java
>  921e050 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
>  17ce69b 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  efd774d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HostAffinitySplitLocationProvider.java
>  c06499e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 8a4fc08 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java
>  d98a5ff 
> 
> Diff: https://reviews.apache.org/r/51312/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>

Reply via email to