metesynnada commented on code in PR #7452:
URL: https://github.com/apache/arrow-datafusion/pull/7452#discussion_r1314561329


##########
datafusion/core/tests/fifo.rs:
##########
@@ -336,6 +336,7 @@ mod unix_test {
 
     /// It tests the INSERT INTO functionality.
     #[tokio::test]
+    #[ignore]

Review Comment:
   It's not the way the threads are started that's the issue, but rather the 
assumption that if a FIFO file receives 10 rows and the batch size is also 10, 
it will create a record batch. The current implementation of 
`serialize_rb_stream_to_object_store` requires access to all data before it can 
proceed.
   
   ```rust 
   while let Some(maybe_batch) = data_stream.next().await {
       let mut serializer_clone = match serializer.duplicate() {
           Ok(s) => s,
           Err(_) => {
               return Err((
                   writer,
                   DataFusionError::Internal(
                       "Unknown error writing to object store".into(),
                   ),
               ))
           }
       };
       serialize_tasks.push(task::spawn(async move {
           let batch = maybe_batch?;
           let num_rows = batch.num_rows();
           let bytes = serializer_clone.serialize(batch).await?;
           Ok((num_rows, bytes))
       }));
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to