[
https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080076#comment-17080076
]
Sahil Takiar commented on IMPALA-5746:
--------------------------------------
[~twmarshall] and I discussed this a bit on the review for the test-case
[https://gerrit.cloudera.org/#/c/15666/] but moving the conversation here.
So, originally I though IMPALA-2990 fixes this, but unfortunately it looks like
the situation is more complicated. There is at least one situation where
killing a coordinator does not cause executors to kill any orphaned fragments.
The fragments only get killed after the report status RPC fails for 10 minutes.
I ran the following query:
{code:java}
select * from tpch.lineitem t1, tpch.lineitem t2, tpch.lineitem t3 where
t1.l_orderkey = t2.l_orderkey and t1.l_orderkey = t3.l_orderkey and
t3.l_orderkey = t2.l_orderkey order by t1.l_orderkey, t2.l_orderkey,
t3.l_orderkey limit 100;
{code}
On a cluster started via {{./bin/start-impala-cluster.py}} (oddly it looks like
if I use a slightly different cluster topology, things are bit different - so
perhaps there is a race condition somewhere).
Waited for the query to run for a bit (progress bar said it was bout 50%
complete). Killed the coordinator, waited for a bit, and then looked at the
/memz page for one of the executors, which showed this:
{code:java}
Process: Limit=7.28 GB Total=1.27 GB Peak=1.44 GB
Buffer Pool: Free Buffers: Total=0
Buffer Pool: Clean Pages: Total=0
Buffer Pool: Unused Reservation: Total=-18.30 MB
Control Service Queue: Limit=50.00 MB Total=0 Peak=15.24 KB
Data Stream Service Queue: Limit=372.92 MB Total=0 Peak=2.01 MB
Data Stream Manager Early RPCs: Total=0 Peak=0
TCMalloc Overhead: Total=30.94 MB
RequestPool=default-pool: Total=1.12 GB Peak=1.18 GB
Query(3e42b7e4a9f9b58b:72759e5d00000000): Reservation=1.10 GB
ReservationLimit=5.83 GB OtherMemory=17.17 MB Total=1.12 GB Peak=1.18 GB
Runtime Filter Bank: Reservation=10.00 MB ReservationLimit=10.00 MB
OtherMemory=0 Total=10.00 MB Peak=10.00 MB
Fragment 3e42b7e4a9f9b58b:72759e5d00000008: Reservation=0 OtherMemory=0
Total=0 Peak=65.57 MB
HDFS_SCAN_NODE (id=2): Reservation=0 OtherMemory=0 Total=0 Peak=65.42 MB
KrpcDataStreamSender (dst_id=8): Total=0 Peak=150.41 KB
Fragment 3e42b7e4a9f9b58b:72759e5d00000005: Reservation=0 OtherMemory=0
Total=0 Peak=65.57 MB
HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=65.42 MB
KrpcDataStreamSender (dst_id=7): Total=0 Peak=150.41 KB
Fragment 3e42b7e4a9f9b58b:72759e5d00000002: Reservation=0 OtherMemory=0
Total=0 Peak=65.91 MB
HDFS_SCAN_NODE (id=0): Reservation=0 OtherMemory=0 Total=0 Peak=65.91 MB
KrpcDataStreamSender (dst_id=6): Total=0 Peak=150.41 KB
Fragment 3e42b7e4a9f9b58b:72759e5d0000000b: Reservation=1.09 GB
OtherMemory=17.06 MB Total=1.11 GB Peak=1.11 GB
SORT_NODE (id=5): Total=148.00 KB Peak=148.00 KB
HASH_JOIN_NODE (id=4): Reservation=558.00 MB OtherMemory=42.25 KB
Total=558.04 MB Peak=558.06 MB
Exprs: Total=13.12 KB Peak=13.12 KB
Hash Join Builder (join_node_id=4): Total=13.12 KB Peak=21.12 KB
Hash Join Builder (join_node_id=4) Exprs: Total=13.12 KB Peak=13.12
KB
HASH_JOIN_NODE (id=3): Reservation=558.00 MB OtherMemory=34.25 KB
Total=558.03 MB Peak=558.05 MB
Exprs: Total=13.12 KB Peak=13.12 KB
Hash Join Builder (join_node_id=3): Total=13.12 KB Peak=21.12 KB
Hash Join Builder (join_node_id=3) Exprs: Total=13.12 KB Peak=13.12
KB
EXCHANGE_NODE (id=6): Reservation=16.84 MB OtherMemory=0 Total=16.84 MB
Peak=16.85 MB
KrpcDeferredRpcs: Total=0 Peak=37.36 KB
EXCHANGE_NODE (id=7): Reservation=0 OtherMemory=0 Total=0 Peak=2.54 MB
KrpcDeferredRpcs: Total=0 Peak=0
EXCHANGE_NODE (id=8): Reservation=0 OtherMemory=0 Total=0 Peak=16.69 MB
KrpcDeferredRpcs: Total=0 Peak=37.56 KB
KrpcDataStreamSender (dst_id=9): Total=272.00 B Peak=272.00 B
CodeGen: Total=12.64 KB Peak=696.50 KB
CodeGen: Total=12.64 KB Peak=696.50 KB
CodeGen: Total=12.64 KB Peak=696.50 KB
CodeGen: Total=75.92 KB Peak=5.00 MB
Untracked Memory: Total=147.84 MB
{code}
The logs of the Impala executor show:
{code:java}
I0409 14:44:01.852174 28903 kudu-status-util.h:55]
3e42b7e4a9f9b58b:72759e5d00000000] ReportExecStatus() RPC failed: Network
error: Client connection negotiation failed: client connection to
127.0.0.1:27000: connect: Connection refused (error 111)
W0409 14:44:01.852253 28903 query-state.cc:498]
3e42b7e4a9f9b58b:72759e5d00000000] Failed to send ReportExecStatus() RPC for
query 3e42b7e4a9f9b58b:72759e5d00000000. Consecutive failed reports = 9. Time
spent retrying = 220034ms.
I0409 14:44:04.862691 8833 krpc-data-stream-mgr.cc:422] Reduced stream ID
cache from 3 items, to 2, eviction took: 0
I0409 14:44:51.856971 8752 connection.cc:445] Transfer of RPC call RPC call
impala.ControlService.ReportExecStatus -> {remote=127.0.0.1:27000
(stakiar-desktop), user_credentials={real_user=impala}, network_plane=control}
aborted: Runtime error: RPC transfer destroyed b
efore it finished sending
{code}
In a loop until the 10 minute timeout is hit, and then fragment cancels itself:
{code:java}
E0409 14:51:36.889616 28903 query-state.cc:523]
3e42b7e4a9f9b58b:72759e5d00000000] Cancelling fragment instances due to failure
to reach the coordinator. (ReportExecStatus() RPC failed: Network error: Client
connection negotiation failed: client connection to 127.0.0.1:27000: connect:
Connection refused (error 111)
).
I0409 14:51:36.889686 28903 query-state.cc:751]
3e42b7e4a9f9b58b:72759e5d00000000] Cancel:
query_id=3e42b7e4a9f9b58b:72759e5d00000000
I0409 14:51:36.889746 28903 krpc-data-stream-mgr.cc:337]
3e42b7e4a9f9b58b:72759e5d00000000] cancelling active streams for
fragment_instance_id=3e42b7e4a9f9b58b:72759e5d0000000b
{code}
The /memz page does not change until the 10 minute timeout is hit. After that,
the /memz page shows no running queries.
So I guess the situation is a lot better than before (10 minutes vs. 30+
minutes, and the timeout is configurable), but 10 minutes is probably still too
long.
I'm not positive I understand exactly what is going on here. It seems that some
fragments do release their resources, and others don't. Maybe this just has to
do with timing, but {{impala-server.num-fragments-in-flight}} is 1, which means
there is only fragment still running, and the value doesn't go to 0 until the
10 minute timeout is hit.
> Remote fragments continue to hold onto memory after stopping the coordinator
> daemon
> -----------------------------------------------------------------------------------
>
> Key: IMPALA-5746
> URL: https://issues.apache.org/jira/browse/IMPALA-5746
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 2.10.0
> Reporter: Mostafa Mokhtar
> Assignee: Sahil Takiar
> Priority: Critical
> Attachments: remote_fragments_holding_memory.txt
>
>
> Repro
> # Start running queries
> # Kill the coordinator node
> # On the running Impalad check the memz tab, remote fragments continue to run
> and hold on to resources
> Remote fragments held on to memory +30 minutes after stopping the coordinator
> service.
> Attached thread dump from an Impalad running remote fragments .
> Snapshot of memz tab 30 minutes after killing the coordinator
> {code}
> Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB
> Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB
> RequestPool=root.default: Total=1.35 GB Peak=178.51 GB
> Query(f64169d4bb3c901c:3a21d8ae00000000): Total=2.64 MB Peak=104.73 MB
> Fragment f64169d4bb3c901c:3a21d8ae00000051: Total=2.64 MB Peak=2.67 MB
> AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB
> Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=12.29 KB
> DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
> Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB
> Query(2a4f12b3b4b1dc8c:db7e8cf200000000): Total=258.29 MB Peak=412.98 MB
> Fragment 2a4f12b3b4b1dc8c:db7e8cf20000008c: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
> Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
> Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB
> Query(68421d2a5dea0775:83f5d97200000000): Total=282.77 MB Peak=443.53 MB
> Fragment 68421d2a5dea0775:83f5d9720000004a: Total=26.77 MB Peak=26.92 MB
> SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB
> Exprs: Total=4.00 KB Peak=4.00 KB
> ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB
> Exprs: Total=4.00 KB Peak=4.00 KB
> SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB
> AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB
> Exprs: Total=85.12 KB Peak=85.12 KB
> EXCHANGE_NODE (id=11): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=84.80 KB
> DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB
> CodeGen: Total=24.80 KB Peak=4.13 MB
> Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB
> Query(e94c89fa89a74d27:82812bf900000000): Total=258.29 MB Peak=436.85 MB
> Fragment e94c89fa89a74d27:82812bf90000008e: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
> Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
> Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB
> Query(4e43dad3bdc935d8:938b8b7e00000000): Total=2.65 MB Peak=105.60 MB
> Fragment 4e43dad3bdc935d8:938b8b7e00000052: Total=2.65 MB Peak=2.68 MB
> AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB
> Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=13.68 KB
> DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
> Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB
> Query(b34bdd65f1ed017e:5a0291bd00000000): Total=2.37 MB Peak=106.56 MB
> Fragment b34bdd65f1ed017e:5a0291bd0000004b: Total=2.37 MB Peak=2.37 MB
> SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB
> Exprs: Total=34.12 KB Peak=34.12 KB
> EXCHANGE_NODE (id=9): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.23 KB
> DataStreamSender (dst_id=11): Total=3.45 KB Peak=3.45 KB
> CodeGen: Total=4.51 KB Peak=1.11 MB
> Block Manager: Limit=161.39 GB Total=256.00 KB Peak=912.81 KB
> Query(b74ba58d53b6c45f:3e8228600000000): Total=190.41 MB Peak=425.09 MB
> Fragment b74ba58d53b6c45f:3e822860000009f: Total=67.90 KB Peak=2.34 MB
> SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB
> HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB
> Peak=9.12 KB
> HDFS_SCAN_NODE (id=11): Total=0 Peak=0
> EXCHANGE_NODE (id=24): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB
> CodeGen: Total=12.59 KB Peak=2.29 MB
> Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
> Fragment b74ba58d53b6c45f:3e8228600000085: Total=2.32 MB Peak=2.32 MB
> AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> EXCHANGE_NODE (id=20): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=23): Total=22.09 KB Peak=22.09 KB
> CodeGen: Total=2.37 KB Peak=546.00 KB
> Fragment b74ba58d53b6c45f:3e8228600000060: Total=188.02 MB Peak=188.34
> MB
> Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
> AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> HASH_JOIN_NODE (id=8): Total=1.13 MB Peak=1.15 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
> Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12
> KB
> HASH_JOIN_NODE (id=7): Total=169.14 MB Peak=169.14 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
> Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12
> KB
> EXCHANGE_NODE (id=17): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=587.50 KB
> EXCHANGE_NODE (id=18): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=316.11 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.70 KB
> DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
> CodeGen: Total=16.80 KB Peak=2.83 MB
> Query(cb4c14997ad6add2:c8f120100000000): Total=190.36 MB Peak=443.00 MB
> Fragment cb4c14997ad6add2:c8f1201000000a4: Total=67.90 KB Peak=2.34 MB
> SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB
> HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB
> Peak=9.12 KB
> HDFS_SCAN_NODE (id=11): Total=0 Peak=0
> EXCHANGE_NODE (id=24): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB
> CodeGen: Total=12.59 KB Peak=2.29 MB
> Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
> Fragment cb4c14997ad6add2:c8f120100000088: Total=2.33 MB Peak=2.33 MB
> AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> EXCHANGE_NODE (id=20): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=23): Total=26.83 KB Peak=26.83 KB
> CodeGen: Total=2.37 KB Peak=546.00 KB
> Fragment cb4c14997ad6add2:c8f120100000063: Total=187.97 MB Peak=188.08
> MB
> Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
> AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
> Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12
> KB
> HASH_JOIN_NODE (id=7): Total=169.07 MB Peak=169.14 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
> Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12
> KB
> EXCHANGE_NODE (id=17): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=314.15 KB
> EXCHANGE_NODE (id=18): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=861.18 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.70 KB
> DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
> CodeGen: Total=16.80 KB Peak=2.83 MB
> Query(f04a57ce97102dd7:c2a1081700000000): Total=190.31 MB Peak=419.11 MB
> Fragment f04a57ce97102dd7:c2a1081700000085: Total=2.33 MB Peak=2.33 MB
> AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> EXCHANGE_NODE (id=20): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=23): Total=23.67 KB Peak=23.67 KB
> CodeGen: Total=2.37 KB Peak=546.00 KB
> Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
> Fragment f04a57ce97102dd7:c2a1081700000060: Total=187.99 MB Peak=188.07
> MB
> Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
> AGGREGATION_NODE (id=9): Total=1.68 MB Peak=1.68 MB
> Exprs: Total=44.12 KB Peak=44.12 KB
> HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
> Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12
> KB
> HASH_JOIN_NODE (id=7): Total=169.09 MB Peak=169.14 MB
> Exprs: Total=9.12 KB Peak=9.12 KB
> Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
> Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12
> KB
> EXCHANGE_NODE (id=17): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=156.71 KB
> EXCHANGE_NODE (id=18): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=1.32 MB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.70 KB
> DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
> CodeGen: Total=16.80 KB Peak=2.83 MB
> Untracked Memory: Total=2.10 GB
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]