tustvold commented on issue #6339:
URL: 
https://github.com/apache/arrow-datafusion/issues/6339#issuecomment-1545867063

   > And should the methods be async
   
   The return type is `BoxFuture` which would allow for asychronous completion
   
   > For that it would be necessary to have some kind of flush method in the 
DataSink trait that allows to update the state of the table
   
   As the returned type is `BoxFuture` the implementation could update the 
state once the stream has been exhausted? i.e. something like
   
   ```
   impl DataSink for MyTable {
       fn write_stream(&self, partition: usize, input: 
SendableRecordBatchStream) -> BoxFuture<Result<u64>> {
           async move {
               while let Some(next) = input.next().await.transpose()? {
                   // Process batch
               }
               // Commit transaction
           }.boxed()
       }
   }
   ```
   
   > return the number of rows written
   
   I personally would return `()` and leave metrics accounting as a 
TableProvider detail
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to