[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #543: Ballista: Implement map-side shuffle

GitBox Mon, 14 Jun 2021 05:08:00 -0700


edrevo commented on a change in pull request #543:
URL: https://github.com/apache/arrow-datafusion/pull/543#discussion_r650881738




##########
File path: ballista/rust/core/src/execution_plans/query_stage.rs
##########
@@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {
                     stats
                 );
 
-                let schema = Arc::new(Schema::new(vec![
-                    Field::new("path", DataType::Utf8, false),
-                    stats.arrow_struct_repr(),
-                ]));
+                let schema = result_schema();
 
                 // build result set with summary of the partition execution 
status
-                let mut c0 = StringBuilder::new(1);
-                c0.append_value(&path).unwrap();
-                let path: ArrayRef = Arc::new(c0.finish());
+                let mut part_builder = UInt32Builder::new(1);
+                part_builder.append_value(partition as u32)?;
+                let part: ArrayRef = Arc::new(part_builder.finish());
+
+                let mut path_builder = StringBuilder::new(1);
+                path_builder.append_value(&path)?;
+                let path: ArrayRef = Arc::new(path_builder.finish());
 
                 let stats: ArrayRef = stats
                     .to_arrow_arrayref()
                     .map_err(|e| DataFusionError::Execution(format!("{:?}", 
e)))?;
-                let batch = RecordBatch::try_new(schema.clone(), vec![path, 
stats])
+                let batch = RecordBatch::try_new(schema.clone(), vec![part, 
path, stats])
                     .map_err(DataFusionError::ArrowError)?;
 
                 Ok(Box::pin(MemoryStream::try_new(vec![batch], schema, None)?))
             }
 
-            Some(Partitioning::Hash(_, _)) => {
-                //TODO re-use code from RepartitionExec to split each batch 
into
-                // partitions and write to one IPC file per partition
-                // See https://github.com/apache/arrow-datafusion/issues/456
-                Err(DataFusionError::NotImplemented(
-                    "Shuffle partitioning not implemented yet".to_owned(),
-                ))
+            Some(Partitioning::Hash(exprs, n)) => {
+                let num_output_partitions = *n;
+
+                // we won't necessary produce output for every possible 
partition, so we
+                // create writers on demand
+                let mut writers: Vec<Option<Arc<Mutex<ShuffleWriter>>>> = 
vec![];
+                for _ in 0..num_output_partitions {
+                    writers.push(None);
+                }
+
+                let hashes_buf = &mut vec![];
+                let random_state = ahash::RandomState::with_seeds(0, 0, 0, 0);
+                while let Some(result) = stream.next().await {
+                    let input_batch = result?;
+                    let arrays = exprs
+                        .iter()
+                        .map(|expr| {
+                            Ok(expr
+                                .evaluate(&input_batch)?
+                                .into_array(input_batch.num_rows()))
+                        })
+                        .collect::<Result<Vec<_>>>()?;
+                    hashes_buf.clear();
+                    hashes_buf.resize(arrays[0].len(), 0);

Review comment:
       noob question: is there a guarantee that all recordbatches have at least 
one element?

##########
File path: ballista/rust/core/src/execution_plans/query_stage.rs
##########
@@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {
                     stats
                 );
 
-                let schema = Arc::new(Schema::new(vec![
-                    Field::new("path", DataType::Utf8, false),
-                    stats.arrow_struct_repr(),
-                ]));
+                let schema = result_schema();
 
                 // build result set with summary of the partition execution 
status
-                let mut c0 = StringBuilder::new(1);
-                c0.append_value(&path).unwrap();
-                let path: ArrayRef = Arc::new(c0.finish());
+                let mut part_builder = UInt32Builder::new(1);
+                part_builder.append_value(partition as u32)?;
+                let part: ArrayRef = Arc::new(part_builder.finish());
+
+                let mut path_builder = StringBuilder::new(1);
+                path_builder.append_value(&path)?;
+                let path: ArrayRef = Arc::new(path_builder.finish());
 
                 let stats: ArrayRef = stats
                     .to_arrow_arrayref()
                     .map_err(|e| DataFusionError::Execution(format!("{:?}", 
e)))?;
-                let batch = RecordBatch::try_new(schema.clone(), vec![path, 
stats])
+                let batch = RecordBatch::try_new(schema.clone(), vec![part, 
path, stats])
                     .map_err(DataFusionError::ArrowError)?;
 
                 Ok(Box::pin(MemoryStream::try_new(vec![batch], schema, None)?))
             }
 
-            Some(Partitioning::Hash(_, _)) => {
-                //TODO re-use code from RepartitionExec to split each batch 
into
-                // partitions and write to one IPC file per partition
-                // See https://github.com/apache/arrow-datafusion/issues/456
-                Err(DataFusionError::NotImplemented(
-                    "Shuffle partitioning not implemented yet".to_owned(),
-                ))
+            Some(Partitioning::Hash(exprs, n)) => {
+                let num_output_partitions = *n;
+
+                // we won't necessary produce output for every possible 
partition, so we
+                // create writers on demand
+                let mut writers: Vec<Option<Arc<Mutex<ShuffleWriter>>>> = 
vec![];

Review comment:
       Looks like Arc + Mutex is unnecessary if you use `.iter_mut()` when 
necessary

##########
File path: ballista/rust/core/src/execution_plans/query_stage.rs
##########
@@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {
                     stats
                 );
 
-                let schema = Arc::new(Schema::new(vec![
-                    Field::new("path", DataType::Utf8, false),
-                    stats.arrow_struct_repr(),
-                ]));
+                let schema = result_schema();
 
                 // build result set with summary of the partition execution 
status
-                let mut c0 = StringBuilder::new(1);
-                c0.append_value(&path).unwrap();
-                let path: ArrayRef = Arc::new(c0.finish());
+                let mut part_builder = UInt32Builder::new(1);
+                part_builder.append_value(partition as u32)?;
+                let part: ArrayRef = Arc::new(part_builder.finish());
+
+                let mut path_builder = StringBuilder::new(1);
+                path_builder.append_value(&path)?;
+                let path: ArrayRef = Arc::new(path_builder.finish());
 
                 let stats: ArrayRef = stats
                     .to_arrow_arrayref()
                     .map_err(|e| DataFusionError::Execution(format!("{:?}", 
e)))?;
-                let batch = RecordBatch::try_new(schema.clone(), vec![path, 
stats])
+                let batch = RecordBatch::try_new(schema.clone(), vec![part, 
path, stats])
                     .map_err(DataFusionError::ArrowError)?;
 
                 Ok(Box::pin(MemoryStream::try_new(vec![batch], schema, None)?))
             }
 
-            Some(Partitioning::Hash(_, _)) => {
-                //TODO re-use code from RepartitionExec to split each batch 
into
-                // partitions and write to one IPC file per partition
-                // See https://github.com/apache/arrow-datafusion/issues/456
-                Err(DataFusionError::NotImplemented(
-                    "Shuffle partitioning not implemented yet".to_owned(),
-                ))
+            Some(Partitioning::Hash(exprs, n)) => {

Review comment:
       just thinking out loud without any data to back me up, but maybe it is 
worth special-casing then n==1, so we don't actually perform the hash of 
everything, since all of the data is going to end up in the same partition 
anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #543: Ballista: Implement map-side shuffle

Reply via email to