[ 
https://issues.apache.org/jira/browse/IMPALA-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengdou1990 updated IMPALA-11473:
---------------------------------
    Description: 
When a large amount of data is inserted into a non-partitioned HDFS table, if 
the fragment instances of SELECT STATEMENT differ greatly in output rows, some 
fragment instances may very slow, while others may much faster. This causes the 
entire insert operation to take a long time to run.

For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2 
cluster, the duration was almost 67 min.
{code:java}
create table user_behavior_export_2022070111 stored as parquet as SELECT * from 
user_behavior_table where time between '2022-07-01 11:00:00.000' and 
'2022-07-01 11:59:59.999'; {code}
_Tips: user_behavior_table is an day partitioned table with more than 1000 
columns._


Execute summary is
{code:java}
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
-----------------------------------------------------------------------------------------------------------------------------
F00:HDFS WRITER       10     21m50s      1h6m                   346.88 MB       
 1.00 GB
00:SCAN HDFS         10    3s401ms    3s217ms  137.49M      94.49M   12.61 GB   
     2.75 GB  dafault.user_behavior_table {code}
The fastest HdfsTableSink executed 4 minutes, while the slowest HdfsTableSink 
executed 66 minutes.

Different HdfsTableSink execution times vary greatly due to the amount of data 
inserted. For Each HDFS scanner, the amount of ReadBytes and ReadRows are not 
significantly skewed, but the amount of data returned makes a performance 
difference of tens of times.
{code:java}
HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)

- RowsInserted: 44.49M (44487661)
- RowsInserted: 21.42M (21417405)
- RowsInserted: 18.34M (18340886)
- RowsInserted: 14.42M (14422287)
- RowsInserted: 14.13M (14128973)
- RowsInserted: 8.96M (8955664)
- RowsInserted: 7.08M (7084182)
- RowsInserted: 6.97M (6969609)
- RowsInserted: 2.45M (2448096)
- RowsInserted: 2.84M (2837459)

- BytesRead: 2.32 GB (2485981168)
- BytesRead: 2.22 GB (2385372978)
- BytesRead: 2.66 GB (2858818300)
- BytesRead: 2.34 GB (2509528520)
- BytesRead: 2.71 GB (2906316201)
- BytesRead: 2.65 GB (2844324158)
- BytesRead: 2.36 GB (2531865904)
- BytesRead: 2.24 GB (2402854679)
- BytesRead: 2.78 GB (2979667784)
- BytesRead: 2.66 GB (2856271181)

- RowsRead: 101.56M (101561956)
- RowsRead: 82.91M (82914833)
- RowsRead: 97.30M (97297545)
- RowsRead: 71.74M (71736869)
- RowsRead: 100.67M (100669817)
- RowsRead: 101.85M (101851844)
- RowsRead: 83.43M (83432722)
- RowsRead: 90.33M (90331924)
- RowsRead: 93.34M (93341164)
- RowsRead: 92.51M (92509166)

- RowsReturned: 13.75M (13748752)
- RowsReturned: 44.49M (44487661)
- RowsReturned: 21.42M (21417405)
- RowsReturned: 18.34M (18340886)
- RowsReturned: 14.42M (14422287)
- RowsReturned: 14.13M (14128973)
- RowsReturned: 8.96M (8955664)
- RowsReturned: 7.08M (7084182)
- RowsReturned: 6.97M (6969609)
- RowsReturned: 2.45M (2448096)
- RowsReturned: 2.84M (2837459) 

{code}
I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve 
this kind of data skew problem.
 
It seems that we should perform some kind of shuffle to ensure that the amount 
of rows inserted on each nodes are not significantly skewed, for exampling 
random shuffle or round-robin shuffle.
 
Do you have any other solutions?

  was:
When a large amount of data is inserted into a non-partitioned HDFS table, if 
the fragment instances of SELECT STATEMENT differ greatly in output rows, some 
fragment instances may very slow, while others may much faster. This causes the 
entire insert operation to take a long time to run.

For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2 
cluster, the duration was almost 67 min.
{code:java}
create table user_behavior_export_2022070111 stored as parquet as SELECT * from 
user_behavior_table where time between '2022-07-01 11:00:00.000' and 
'2022-07-01 11:59:59.999'; {code}
Tips: user_behavior_table is an day partitioned table with 1000 columns.
 
{code:java}
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
-----------------------------------------------------------------------------------------------------------------------------
F00:HDFS WRITER       10     21m50s       1h6m                       346.88 MB  
      1.00 GB
00:SCAN HDFS          10    3s401ms    3s217ms  137.49M      94.49M   12.61 GB  
      2.75 GB  dafault.user_behavior_table {code}
The fastest HdfsTableSink executed 2 minutes, while the slowest HdfsTableSink 
executed 66 minutes. Different HdfsTableSink execution times vary greatly due 
to the amount of data inserted. For Each HDFS scanner, the amount of ReadBytes 
and ReadRows are not significantly skewed, but the amount of data returned 
makes a performance difference of tens of times.
{code:java}
HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)

- RowsInserted: 44.49M (44487661)
- RowsInserted: 21.42M (21417405)
- RowsInserted: 18.34M (18340886)
- RowsInserted: 14.42M (14422287)
- RowsInserted: 14.13M (14128973)
- RowsInserted: 8.96M (8955664)
- RowsInserted: 7.08M (7084182)
- RowsInserted: 6.97M (6969609)
- RowsInserted: 2.45M (2448096)
- RowsInserted: 2.84M (2837459)

- BytesRead: 2.32 GB (2485981168)
- BytesRead: 2.22 GB (2385372978)
- BytesRead: 2.66 GB (2858818300)
- BytesRead: 2.34 GB (2509528520)
- BytesRead: 2.71 GB (2906316201)
- BytesRead: 2.65 GB (2844324158)
- BytesRead: 2.36 GB (2531865904)
- BytesRead: 2.24 GB (2402854679)
- BytesRead: 2.78 GB (2979667784)
- BytesRead: 2.66 GB (2856271181)

- RowsRead: 101.56M (101561956)
- RowsRead: 82.91M (82914833)
- RowsRead: 97.30M (97297545)
- RowsRead: 71.74M (71736869)
- RowsRead: 100.67M (100669817)
- RowsRead: 101.85M (101851844)
- RowsRead: 83.43M (83432722)
- RowsRead: 90.33M (90331924)
- RowsRead: 93.34M (93341164)
- RowsRead: 92.51M (92509166)

- RowsReturned: 13.75M (13748752)
- RowsReturned: 44.49M (44487661)
- RowsReturned: 21.42M (21417405)
- RowsReturned: 18.34M (18340886)
- RowsReturned: 14.42M (14422287)
- RowsReturned: 14.13M (14128973)
- RowsReturned: 8.96M (8955664)
- RowsReturned: 7.08M (7084182)
- RowsReturned: 6.97M (6969609)
- RowsReturned: 2.45M (2448096)
- RowsReturned: 2.84M (2837459) 

{code}
I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve 
this kind of data skew problem.
 
It seems that we should perform some kind of shuffle to ensure that the amount 
of rows inserted on each nodes are not significantly skewed, for exampling 
random shuffle or round-robin shuffle.
 
Do you have any other solutions?


> Data skew occurs when execute HDFS table insert operation
> ---------------------------------------------------------
>
>                 Key: IMPALA-11473
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11473
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>    Affects Versions: Impala 3.2.0, Impala 4.0.0
>            Reporter: pengdou1990
>            Assignee: pengdou1990
>            Priority: Major
>
> When a large amount of data is inserted into a non-partitioned HDFS table, if 
> the fragment instances of SELECT STATEMENT differ greatly in output rows, 
> some fragment instances may very slow, while others may much faster. This 
> causes the entire insert operation to take a long time to run.
> For example, when I execute the following Insert SQL in a 10 nodes IMPALA 3.2 
> cluster, the duration was almost 67 min.
> {code:java}
> create table user_behavior_export_2022070111 stored as parquet as SELECT * 
> from user_behavior_table where time between '2022-07-01 11:00:00.000' and 
> '2022-07-01 11:59:59.999'; {code}
> _Tips: user_behavior_table is an day partitioned table with more than 1000 
> columns._
> Execute summary is
> {code:java}
> Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak 
> Mem  Est. Peak Mem  Detail
> -----------------------------------------------------------------------------------------------------------------------------
> F00:HDFS WRITER       10     21m50s      1h6m                   346.88 MB     
>    1.00 GB
> 00:SCAN HDFS         10    3s401ms    3s217ms  137.49M      94.49M   12.61 GB 
>        2.75 GB  dafault.user_behavior_table {code}
> The fastest HdfsTableSink executed 4 minutes, while the slowest HdfsTableSink 
> executed 66 minutes.
> Different HdfsTableSink execution times vary greatly due to the amount of 
> data inserted. For Each HDFS scanner, the amount of ReadBytes and ReadRows 
> are not significantly skewed, but the amount of data returned makes a 
> performance difference of tens of times.
> {code:java}
> HdfsTableSink:(Total: 1h6m, non-child: 1h6m, % non-child: 100.00%)
> HdfsTableSink:(Total: 34m44s, non-child: 34m44s, % non-child: 100.00%)
> HdfsTableSink:(Total: 25m20s, non-child: 25m20s, % non-child: 100.00%)
> HdfsTableSink:(Total: 20m36s, non-child: 20m36s, % non-child: 100.00%)
> HdfsTableSink:(Total: 25m46s, non-child: 25m46s, % non-child: 100.00%)
> HdfsTableSink:(Total: 12m56s, non-child: 12m56s, % non-child: 100.00%)
> HdfsTableSink:(Total: 10m56s, non-child: 10m56s, % non-child: 100.00%)
> HdfsTableSink:(Total: 10m43s, non-child: 10m43s, % non-child: 100.00%)
> HdfsTableSink:(Total: 4m7s, non-child: 4m7s, % non-child: 100.00%)
> HdfsTableSink:(Total: 6m19s, non-child: 6m19s, % non-child: 100.00%)
> - RowsInserted: 44.49M (44487661)
> - RowsInserted: 21.42M (21417405)
> - RowsInserted: 18.34M (18340886)
> - RowsInserted: 14.42M (14422287)
> - RowsInserted: 14.13M (14128973)
> - RowsInserted: 8.96M (8955664)
> - RowsInserted: 7.08M (7084182)
> - RowsInserted: 6.97M (6969609)
> - RowsInserted: 2.45M (2448096)
> - RowsInserted: 2.84M (2837459)
> - BytesRead: 2.32 GB (2485981168)
> - BytesRead: 2.22 GB (2385372978)
> - BytesRead: 2.66 GB (2858818300)
> - BytesRead: 2.34 GB (2509528520)
> - BytesRead: 2.71 GB (2906316201)
> - BytesRead: 2.65 GB (2844324158)
> - BytesRead: 2.36 GB (2531865904)
> - BytesRead: 2.24 GB (2402854679)
> - BytesRead: 2.78 GB (2979667784)
> - BytesRead: 2.66 GB (2856271181)
> - RowsRead: 101.56M (101561956)
> - RowsRead: 82.91M (82914833)
> - RowsRead: 97.30M (97297545)
> - RowsRead: 71.74M (71736869)
> - RowsRead: 100.67M (100669817)
> - RowsRead: 101.85M (101851844)
> - RowsRead: 83.43M (83432722)
> - RowsRead: 90.33M (90331924)
> - RowsRead: 93.34M (93341164)
> - RowsRead: 92.51M (92509166)
> - RowsReturned: 13.75M (13748752)
> - RowsReturned: 44.49M (44487661)
> - RowsReturned: 21.42M (21417405)
> - RowsReturned: 18.34M (18340886)
> - RowsReturned: 14.42M (14422287)
> - RowsReturned: 14.13M (14128973)
> - RowsReturned: 8.96M (8955664)
> - RowsReturned: 7.08M (7084182)
> - RowsReturned: 6.97M (6969609)
> - RowsReturned: 2.45M (2448096)
> - RowsReturned: 2.84M (2837459) 
> {code}
> I've tried all insert hints on IMPALA 3.2 and IMPALA 4.0, none of them solve 
> this kind of data skew problem.
>  
> It seems that we should perform some kind of shuffle to ensure that the 
> amount of rows inserted on each nodes are not significantly skewed, for 
> exampling random shuffle or round-robin shuffle.
>  
> Do you have any other solutions?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to