mingmwang commented on a change in pull request #1987:
URL: https://github.com/apache/arrow-datafusion/pull/1987#discussion_r825562403



##########
File path: datafusion/src/physical_plan/coalesce_batches.rs
##########
@@ -124,10 +124,10 @@ impl ExecutionPlan for CoalesceBatchesExec {
     async fn execute(
         &self,
         partition: usize,
-        runtime: Arc<RuntimeEnv>,
+        context: Arc<TaskContext>,
     ) -> Result<SendableRecordBatchStream> {
         Ok(Box::pin(CoalesceBatchesStream {
-            input: self.input.execute(partition, runtime).await?,
+            input: self.input.execute(partition, context).await?,

Review comment:
       Agree. That's why I'm using the string task_id in TaskContext instead of 
a struct.  
   Maybe we should use another uuid to uniquely present a task for Ballista 
Task.
   And currently in the system, we have task_id and partiton_id used 
alternatively. 
   
   In Ballista proto
   
   ````
   #[derive(Clone, PartialEq, ::prost::Message)]
   pub struct TaskDefinition {
       #[prost(message, optional, tag = "1")]
       pub task_id: ::core::option::Option<PartitionId>,
       #[prost(bytes = "vec", tag = "2")]
       pub plan: ::prost::alloc::vec::Vec<u8>,
       /// Output partition for shuffle writer
       #[prost(message, optional, tag = "3")]
       pub output_partitioning: ::core::option::Option<PhysicalHashRepartition>,
   }
   ````
   
   In scheduler/mod.rs
   
   ````
   /// Unique identifier for the output partition of an operator.
   #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
   pub struct PartitionId {
       pub job_id: String,
       pub stage_id: usize,
       pub partition_id: usize,
   }
   ````
   
   And agree that in DataFusion the Task is a vague term.  As @yahoNanJing 
mentioned early, the TaskContext is actually the execute() method's context.  
To avoid confusing with the original ExecutionContext, I do not use 
ExecutionContext but TaskContext.  Actually in DuckDb they call it 
ExecutionContext. I'm open to the naming, If everyone agree to use 
ExecutionContext, I can change it to avoid introducing a vague Task term to 
DataFusion.
   
   DuckDB code
   ````
   OperatorResultType Execute(ExecutionContext &context, DataChunk &input, 
DataChunk &chunk,
                                   OperatorState &state) const override;
   ````




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to