[
https://issues.apache.org/jira/browse/IMPALA-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
pengdou1990 updated IMPALA-11473:
---------------------------------
Description:
When a large amount of data is inserted into a non-partitioned HDFS table, if
the fragment instances of SELECT STATEMENT differ greatly in output rows, some
fragment instances may very slow, while others may much faster. This causes the
entire insert operation to take a long time to run.
For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2
cluster, the duration was almost 67 min.
{code:java}
create table user_behavior_export_2022070111 stored as parquet as SELECT * from
user_behavior_table where time between '2022-07-01 11:00:00.000' and
'2022-07-01 11:59:59.999' and day = 20220701; {code}
_Tips: user_behavior_table is an day partitioned table with more than 1000
columns._
Execute summary is
{code:java}
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------------------------
F00:HDFS WRITER 10 21m50s 1h6m 346.88 MB
1.00 GB
00:SCAN HDFS 10 3s401ms 3s217ms 137.49M 94.49M 12.61 GB
2.75 GB dafault.user_behavior_table {code}
The fastest HdfsTableSink executed 4 minutes, while the slowest HdfsTableSink
executed 66 minutes.
Different HdfsTableSink execution times vary greatly due to the amount of data
inserted. For Each HDFS scanner, the amount of ReadBytes and ReadRows are not
significantly skewed, but the amount of data returned makes a performance
difference of tens of times.
{code:java}
HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)
- RowsInserted: 44.49M (44487661)
- RowsInserted: 21.42M (21417405)
- RowsInserted: 18.34M (18340886)
- RowsInserted: 14.42M (14422287)
- RowsInserted: 14.13M (14128973)
- RowsInserted: 8.96M (8955664)
- RowsInserted: 7.08M (7084182)
- RowsInserted: 6.97M (6969609)
- RowsInserted: 2.45M (2448096)
- RowsInserted: 2.84M (2837459)
- BytesRead: 2.32 GB (2485981168)
- BytesRead: 2.22 GB (2385372978)
- BytesRead: 2.66 GB (2858818300)
- BytesRead: 2.34 GB (2509528520)
- BytesRead: 2.71 GB (2906316201)
- BytesRead: 2.65 GB (2844324158)
- BytesRead: 2.36 GB (2531865904)
- BytesRead: 2.24 GB (2402854679)
- BytesRead: 2.78 GB (2979667784)
- BytesRead: 2.66 GB (2856271181)
- RowsRead: 101.56M (101561956)
- RowsRead: 82.91M (82914833)
- RowsRead: 97.30M (97297545)
- RowsRead: 71.74M (71736869)
- RowsRead: 100.67M (100669817)
- RowsRead: 101.85M (101851844)
- RowsRead: 83.43M (83432722)
- RowsRead: 90.33M (90331924)
- RowsRead: 93.34M (93341164)
- RowsRead: 92.51M (92509166)
- RowsReturned: 13.75M (13748752)
- RowsReturned: 44.49M (44487661)
- RowsReturned: 21.42M (21417405)
- RowsReturned: 18.34M (18340886)
- RowsReturned: 14.42M (14422287)
- RowsReturned: 14.13M (14128973)
- RowsReturned: 8.96M (8955664)
- RowsReturned: 7.08M (7084182)
- RowsReturned: 6.97M (6969609)
- RowsReturned: 2.45M (2448096)
- RowsReturned: 2.84M (2837459)
{code}
I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve
this kind of data skew problem.
It seems that we should perform some kind of shuffle to ensure that the amount
of rows inserted on each nodes are not significantly skewed, for exampling
random shuffle or round-robin shuffle.
Do you have any other solutions?
was:
When a large amount of data is inserted into a non-partitioned HDFS table, if
the fragment instances of SELECT STATEMENT differ greatly in output rows, some
fragment instances may very slow, while others may much faster. This causes the
entire insert operation to take a long time to run.
For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2
cluster, the duration was almost 67 min.
{code:java}
create table user_behavior_export_2022070111 stored as parquet as SELECT * from
user_behavior_table where time between '2022-07-01 11:00:00.000' and
'2022-07-01 11:59:59.999'; {code}
_Tips: user_behavior_table is an day partitioned table with more than 1000
columns._
Execute summary is
{code:java}
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------------------------
F00:HDFS WRITER 10 21m50s 1h6m 346.88 MB
1.00 GB
00:SCAN HDFS 10 3s401ms 3s217ms 137.49M 94.49M 12.61 GB
2.75 GB dafault.user_behavior_table {code}
The fastest HdfsTableSink executed 4 minutes, while the slowest HdfsTableSink
executed 66 minutes.
Different HdfsTableSink execution times vary greatly due to the amount of data
inserted. For Each HDFS scanner, the amount of ReadBytes and ReadRows are not
significantly skewed, but the amount of data returned makes a performance
difference of tens of times.
{code:java}
HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)
- RowsInserted: 44.49M (44487661)
- RowsInserted: 21.42M (21417405)
- RowsInserted: 18.34M (18340886)
- RowsInserted: 14.42M (14422287)
- RowsInserted: 14.13M (14128973)
- RowsInserted: 8.96M (8955664)
- RowsInserted: 7.08M (7084182)
- RowsInserted: 6.97M (6969609)
- RowsInserted: 2.45M (2448096)
- RowsInserted: 2.84M (2837459)
- BytesRead: 2.32 GB (2485981168)
- BytesRead: 2.22 GB (2385372978)
- BytesRead: 2.66 GB (2858818300)
- BytesRead: 2.34 GB (2509528520)
- BytesRead: 2.71 GB (2906316201)
- BytesRead: 2.65 GB (2844324158)
- BytesRead: 2.36 GB (2531865904)
- BytesRead: 2.24 GB (2402854679)
- BytesRead: 2.78 GB (2979667784)
- BytesRead: 2.66 GB (2856271181)
- RowsRead: 101.56M (101561956)
- RowsRead: 82.91M (82914833)
- RowsRead: 97.30M (97297545)
- RowsRead: 71.74M (71736869)
- RowsRead: 100.67M (100669817)
- RowsRead: 101.85M (101851844)
- RowsRead: 83.43M (83432722)
- RowsRead: 90.33M (90331924)
- RowsRead: 93.34M (93341164)
- RowsRead: 92.51M (92509166)
- RowsReturned: 13.75M (13748752)
- RowsReturned: 44.49M (44487661)
- RowsReturned: 21.42M (21417405)
- RowsReturned: 18.34M (18340886)
- RowsReturned: 14.42M (14422287)
- RowsReturned: 14.13M (14128973)
- RowsReturned: 8.96M (8955664)
- RowsReturned: 7.08M (7084182)
- RowsReturned: 6.97M (6969609)
- RowsReturned: 2.45M (2448096)
- RowsReturned: 2.84M (2837459)
{code}
I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve
this kind of data skew problem.
It seems that we should perform some kind of shuffle to ensure that the amount
of rows inserted on each nodes are not significantly skewed, for exampling
random shuffle or round-robin shuffle.
Do you have any other solutions?
> Data skew occurs when execute HDFS table insert operation
> ---------------------------------------------------------
>
> Key: IMPALA-11473
> URL: https://issues.apache.org/jira/browse/IMPALA-11473
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Affects Versions: Impala 3.2.0, Impala 4.0.0
> Reporter: pengdou1990
> Assignee: pengdou1990
> Priority: Major
>
> When a large amount of data is inserted into a non-partitioned HDFS table, if
> the fragment instances of SELECT STATEMENT differ greatly in output rows,
> some fragment instances may very slow, while others may much faster. This
> causes the entire insert operation to take a long time to run.
> For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2
> cluster, the duration was almost 67 min.
> {code:java}
> create table user_behavior_export_2022070111 stored as parquet as SELECT *
> from user_behavior_table where time between '2022-07-01 11:00:00.000' and
> '2022-07-01 11:59:59.999' and day = 20220701; {code}
> _Tips: user_behavior_table is an day partitioned table with more than 1000
> columns._
> Execute summary is
> {code:java}
> Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak
> Mem Est. Peak Mem Detail
> -----------------------------------------------------------------------------------------------------------------------------
> F00:HDFS WRITER 10 21m50s 1h6m 346.88 MB
> 1.00 GB
> 00:SCAN HDFS 10 3s401ms 3s217ms 137.49M 94.49M 12.61 GB
> 2.75 GB dafault.user_behavior_table {code}
> The fastest HdfsTableSink executed 4 minutes, while the slowest HdfsTableSink
> executed 66 minutes.
> Different HdfsTableSink execution times vary greatly due to the amount of
> data inserted. For Each HDFS scanner, the amount of ReadBytes and ReadRows
> are not significantly skewed, but the amount of data returned makes a
> performance difference of tens of times.
> {code:java}
> HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
> HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
> HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
> HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
> HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
> HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
> HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
> HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
> HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
> HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)
> - RowsInserted: 44.49M (44487661)
> - RowsInserted: 21.42M (21417405)
> - RowsInserted: 18.34M (18340886)
> - RowsInserted: 14.42M (14422287)
> - RowsInserted: 14.13M (14128973)
> - RowsInserted: 8.96M (8955664)
> - RowsInserted: 7.08M (7084182)
> - RowsInserted: 6.97M (6969609)
> - RowsInserted: 2.45M (2448096)
> - RowsInserted: 2.84M (2837459)
> - BytesRead: 2.32 GB (2485981168)
> - BytesRead: 2.22 GB (2385372978)
> - BytesRead: 2.66 GB (2858818300)
> - BytesRead: 2.34 GB (2509528520)
> - BytesRead: 2.71 GB (2906316201)
> - BytesRead: 2.65 GB (2844324158)
> - BytesRead: 2.36 GB (2531865904)
> - BytesRead: 2.24 GB (2402854679)
> - BytesRead: 2.78 GB (2979667784)
> - BytesRead: 2.66 GB (2856271181)
> - RowsRead: 101.56M (101561956)
> - RowsRead: 82.91M (82914833)
> - RowsRead: 97.30M (97297545)
> - RowsRead: 71.74M (71736869)
> - RowsRead: 100.67M (100669817)
> - RowsRead: 101.85M (101851844)
> - RowsRead: 83.43M (83432722)
> - RowsRead: 90.33M (90331924)
> - RowsRead: 93.34M (93341164)
> - RowsRead: 92.51M (92509166)
> - RowsReturned: 13.75M (13748752)
> - RowsReturned: 44.49M (44487661)
> - RowsReturned: 21.42M (21417405)
> - RowsReturned: 18.34M (18340886)
> - RowsReturned: 14.42M (14422287)
> - RowsReturned: 14.13M (14128973)
> - RowsReturned: 8.96M (8955664)
> - RowsReturned: 7.08M (7084182)
> - RowsReturned: 6.97M (6969609)
> - RowsReturned: 2.45M (2448096)
> - RowsReturned: 2.84M (2837459)
> {code}
> I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve
> this kind of data skew problem.
>
> It seems that we should perform some kind of shuffle to ensure that the
> amount of rows inserted on each nodes are not significantly skewed, for
> exampling random shuffle or round-robin shuffle.
>
> Do you have any other solutions?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]