[jira] [Updated] (IMPALA-8687) --rpc_use_loopback may not work for runtime filter RPCs

Tim Armstrong (JIRA) Wed, 19 Jun 2019 17:31:50 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Armstrong updated IMPALA-8687:
----------------------------------
    Description: 
Following on from IMPALA-8659, we may have some cases where impalads do 
self-RPCs via the thrift internal service IMPALA-7984. This JIRA is to 
investigate if this is a problem, and to fix it (either by intercepting 
self-RPCs in Thrift or by making code changes to avoid it).

Basic join where global runtime filters should apply:
{code}
select straight_join count(*)
from alltypes t1 join /*+ shuffle */ alltypes t2 on t1.id = t2.id
where t2.string_col = '1';
{code}

Interesting cases
* Dedicated coordinator with distributed plan ==> expect that all joins and 
scans run on executors and all filter aggregation happens on coordinator.
* Single node plan (num_nodes=1) ==> expect that all filters are local ==> no 
RPCs required
* Combined coordinator/executor with distributed plan ==> may do self-RPC

So I think in the dedicated coordinator/executor case we're ok. Note that 
IMPALA-3825 may violate the above assumptions.

I can pretty easily reproduce the issue on combined coordinators/executors with 
verbosity level 2. This is a log excerpt from the Impalad tarmstrong-box:22000
{noformat}
I0619 17:28:00.913919 25525 client-cache.cc:47] GetClient(tarmstrong-box:22000)
I0619 17:28:00.913924 25525 client-cache.cc:57] GetClient(): returning cached 
client for tarmstrong-box:22000
I0619 17:28:00.914047 25425 rpc-trace.cc:202] RPC call: 
ImpalaInternalService.PublishFilter(from ::ffff:127.0.0.1:41902)
I0619 17:28:00.914587 25425 query-exec-mgr.cc:98] QueryState: 
query_id=624be7fc0bc0e122:0fbdc17200000000 refcnt=6
I0619 17:28:00.914597 25425 fragment-instance-state.cc:511] PublishFilter(): 
instance_id=624be7fc0bc0e122:0fbdc17200000002 filter_id=0
I0619 17:28:00.915010 25425 query-exec-mgr.cc:162] ReleaseQueryState(): 
query_id=624be7fc0bc0e122:0fbdc17200000000 refcnt=6
I0619 17:28:00.915038 25425 rpc-trace.cc:212] RPC call: 
backend:ImpalaInternalService.PublishFilter from ::ffff:127.0.0.1:41902 took 
1.000ms
I0619 17:28:00.915043 25525 client-cache.cc:152] Releasing client for 
tarmstrong-box:22000 back to cache
I0619 17:28:00.915175 25525 rpc-trace.cc:212] RPC call: 
backend:ImpalaInternalService.UpdateFilter from ::ffff:127.0.0.1:41930 took 
5.000ms
I0619 17:28:00.922312 25437 scan-node.cc:192] 
624be7fc0bc0e122:0fbdc17200000002] Filters arrived. Waited 351ms
{noformat}


  was:
Following on from IMPALA-8659, we may have some cases where impalads do 
self-RPCs via the thrift internal service IMPALA-7984. This JIRA is to 
investigate if this is a problem, and to fix it (either by intercepting 
self-RPCs in Thrift or by making code changes to avoid it).

Basic join where global runtime filters should apply:
{code}
select straight_join count(*)
from alltypes t1 join /*+ shuffle */ alltypes t2 on t1.id = t2.id
where t2.string_col = '1';
{code}

Interesting cases
* Dedicated coordinator with distributed plan ==> expect that all joins run on 
executors and all filter aggregation happens on coordinator
* Single node plan (num_nodes=1) ==> expect that all filters are local ==> no 
RPCs required
* Combined coordinator/executor with distributed plan ==> may do self-RPC


> --rpc_use_loopback may not work for runtime filter RPCs
> -------------------------------------------------------
>
>                 Key: IMPALA-8687
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8687
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> Following on from IMPALA-8659, we may have some cases where impalads do 
> self-RPCs via the thrift internal service IMPALA-7984. This JIRA is to 
> investigate if this is a problem, and to fix it (either by intercepting 
> self-RPCs in Thrift or by making code changes to avoid it).
> Basic join where global runtime filters should apply:
> {code}
> select straight_join count(*)
> from alltypes t1 join /*+ shuffle */ alltypes t2 on t1.id = t2.id
> where t2.string_col = '1';
> {code}
> Interesting cases
> * Dedicated coordinator with distributed plan ==> expect that all joins and 
> scans run on executors and all filter aggregation happens on coordinator.
> * Single node plan (num_nodes=1) ==> expect that all filters are local ==> no 
> RPCs required
> * Combined coordinator/executor with distributed plan ==> may do self-RPC
> So I think in the dedicated coordinator/executor case we're ok. Note that 
> IMPALA-3825 may violate the above assumptions.
> I can pretty easily reproduce the issue on combined coordinators/executors 
> with verbosity level 2. This is a log excerpt from the Impalad 
> tarmstrong-box:22000
> {noformat}
> I0619 17:28:00.913919 25525 client-cache.cc:47] 
> GetClient(tarmstrong-box:22000)
> I0619 17:28:00.913924 25525 client-cache.cc:57] GetClient(): returning cached 
> client for tarmstrong-box:22000
> I0619 17:28:00.914047 25425 rpc-trace.cc:202] RPC call: 
> ImpalaInternalService.PublishFilter(from ::ffff:127.0.0.1:41902)
> I0619 17:28:00.914587 25425 query-exec-mgr.cc:98] QueryState: 
> query_id=624be7fc0bc0e122:0fbdc17200000000 refcnt=6
> I0619 17:28:00.914597 25425 fragment-instance-state.cc:511] PublishFilter(): 
> instance_id=624be7fc0bc0e122:0fbdc17200000002 filter_id=0
> I0619 17:28:00.915010 25425 query-exec-mgr.cc:162] ReleaseQueryState(): 
> query_id=624be7fc0bc0e122:0fbdc17200000000 refcnt=6
> I0619 17:28:00.915038 25425 rpc-trace.cc:212] RPC call: 
> backend:ImpalaInternalService.PublishFilter from ::ffff:127.0.0.1:41902 took 
> 1.000ms
> I0619 17:28:00.915043 25525 client-cache.cc:152] Releasing client for 
> tarmstrong-box:22000 back to cache
> I0619 17:28:00.915175 25525 rpc-trace.cc:212] RPC call: 
> backend:ImpalaInternalService.UpdateFilter from ::ffff:127.0.0.1:41930 took 
> 5.000ms
> I0619 17:28:00.922312 25437 scan-node.cc:192] 
> 624be7fc0bc0e122:0fbdc17200000002] Filters arrived. Waited 351ms
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8687) --rpc_use_loopback may not work for runtime filter RPCs

Reply via email to