pengdou1990 created IMPALA-11473:
------------------------------------
Summary: Data skew occurs when execute HDFS table insert operation
Key: IMPALA-11473
URL: https://issues.apache.org/jira/browse/IMPALA-11473
Project: IMPALA
Issue Type: Improvement
Components: Backend, Frontend
Affects Versions: Impala 3.2.0
Reporter: pengdou1990
Assignee: pengdou1990
When a large amount of data is inserted into a non-partitioned HDFS table, if
the fragment instances of SELECT STATEMENT differ greatly in output rows, some
fragment instances may very slow, while others may much faster. This causes the
entire insert operation to take a long time to run.
For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2
cluster, the duration was almost 67 min.
{code:java}
create table user_behavior_export_2022070111 stored as parquet as SELECT * from
user_behavior_table where time between '2022-07-01 11:00:00.000' and
'2022-07-01 11:59:59.999'; {code}
Tips: user_behavior_table is an day partitioned table with 1000 columns.
{code:java}
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------------------------
F00:HDFS WRITER 10 21m50s 1h6m 346.88 MB
1.00 GB
00:SCAN HDFS 10 3s401ms 3s217ms 137.49M 94.49M 12.61 GB
2.75 GB dafault.user_behavior_table {code}
The fastest HdfsTableSink executed 2 minutes, while the slowest HdfsTableSink
executed 66 minutes. Different HdfsTableSink execution times vary greatly due
to the amount of data inserted. For Each HDFS scanner, the amount of ReadBytes
and ReadRows are not significantly skewed, but the amount of data returned
makes a performance difference of tens of times.
{code:java}
HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)
- RowsInserted: 44.49M (44487661)
- RowsInserted: 21.42M (21417405)
- RowsInserted: 18.34M (18340886)
- RowsInserted: 14.42M (14422287)
- RowsInserted: 14.13M (14128973)
- RowsInserted: 8.96M (8955664)
- RowsInserted: 7.08M (7084182)
- RowsInserted: 6.97M (6969609)
- RowsInserted: 2.45M (2448096)
- RowsInserted: 2.84M (2837459)
- BytesRead: 2.32 GB (2485981168)
- BytesRead: 2.22 GB (2385372978)
- BytesRead: 2.66 GB (2858818300)
- BytesRead: 2.34 GB (2509528520)
- BytesRead: 2.71 GB (2906316201)
- BytesRead: 2.65 GB (2844324158)
- BytesRead: 2.36 GB (2531865904)
- BytesRead: 2.24 GB (2402854679)
- BytesRead: 2.78 GB (2979667784)
- BytesRead: 2.66 GB (2856271181)
- RowsRead: 101.56M (101561956)
- RowsRead: 82.91M (82914833)
- RowsRead: 97.30M (97297545)
- RowsRead: 71.74M (71736869)
- RowsRead: 100.67M (100669817)
- RowsRead: 101.85M (101851844)
- RowsRead: 83.43M (83432722)
- RowsRead: 90.33M (90331924)
- RowsRead: 93.34M (93341164)
- RowsRead: 92.51M (92509166)
- RowsReturned: 13.75M (13748752)
- RowsReturned: 44.49M (44487661)
- RowsReturned: 21.42M (21417405)
- RowsReturned: 18.34M (18340886)
- RowsReturned: 14.42M (14422287)
- RowsReturned: 14.13M (14128973)
- RowsReturned: 8.96M (8955664)
- RowsReturned: 7.08M (7084182)
- RowsReturned: 6.97M (6969609)
- RowsReturned: 2.45M (2448096)
- RowsReturned: 2.84M (2837459)
{code}
I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve
this kind of data skew problem.
It seems that we should perform some kind of shuffle to ensure that the amount
of rows inserted on each nodes are not significantly skewed, for exampling
random shuffle or round-robin shuffle.
Do you have any other solutions?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]