milenkovicm commented on issue #17297:
URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3508815981

   Hi @jizezhang, thanks for sharing interest, 
   
   > Hi [@milenkovicm](https://github.com/milenkovicm) , I am interested in 
this task and would like to understand the proposal better. Is the idea that
   > 
   > * `DataFrame::cache` will create a new logical plan with a `TableScan` 
node as the root and current logical plan of the dataframe (via `self.plan`) as 
its child?
   
   yes, new logical plan node to be created, now its question which one to be 
creted
   
   > * To capture the lineage, `TableScan` would be constructed with a custom 
`InMemoryTableSource` that overrides the `get_logical_plan` method from the 
trait `TableSource`?
   
   I have tried that, it did not work as expected, I don't remember reason why 
   
   > * Would `DataFrame::cache` returns `ctx.execute_logical_plan(new_plan)`?
   
   Not necessarily to execute logical plan right then, it should just return 
logical plan node and let user decide when to run the plan .
   
   I believe the simplest approach would be to create a factory method which 
user can configure 
   
   Something like:
   
   ```rust
   
   pub async fn cache(self) -> Result<DataFrame> {
      if let Some(cache_producer) = self.state().cache_producer() {
       cache_producer(&self)
     } else {
       // this is current behaviour
        let context = 
SessionContext::new_with_state((*self.session_state).clone());
        // The schema is consistent with the output
        let plan = self.clone().create_physical_plan().await?;
        let schema = plan.schema();
        let task_ctx = Arc::new(self.task_ctx());
        let partitions = collect_partitioned(plan, task_ctx).await?;
        let mem_table = MemTable::try_new(schema, partitions)?;
        context.read_table(Arc::new(mem_table))
     }
   }
   ```
   
   `cache_producer` would be a method accepting DataFrame and returning 
DataFrame, probably a logical plan extension.
   
   Does it make sense? Let me know what you think 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to