Dandandan commented on a change in pull request #1459:
URL: https://github.com/apache/arrow-datafusion/pull/1459#discussion_r771090481
##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -326,6 +326,11 @@ impl RepartitionExec {
Partitioning::Hash(exprs, _) => {
let timer = r_metrics.repart_time.timer();
let input_batch = result?;
+ //avoid send empty batch to next plan
+ if input_batch.num_rows() == 0 {
Review comment:
Sorry if my issue description wasn't clear enough.
I think a much more common thing might be that one output batch is empty
*after partitioning*, somewhat later in this method.
So after hashing / dividng them into partitions we could avoid creating /
sending empty batches.
##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -326,6 +326,11 @@ impl RepartitionExec {
Partitioning::Hash(exprs, _) => {
let timer = r_metrics.repart_time.timer();
let input_batch = result?;
+ //avoid send empty batch to next plan
+ if input_batch.num_rows() == 0 {
Review comment:
So after the line here
` for (num_output_partition, partition_indices) in ...`
We can add a check and `continue` when partition_indices is empty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]