ShiKaiWi commented on issue #1269:
URL: https://github.com/apache/arrow-rs/issues/1269#issuecomment-1469600914

   @tustvold Thanks for your quick response. Here are my some thoughts about 
your proposal:
   > This would then allow providing a Vec<u8> as the writer, and then 
periodically gaining access to it and flushing its contents asynchronously. We 
could provide an AsyncWriter that encapsulates this logic, but we could also 
just provide a code example in a doc comment.
   
   I guess the **periodically gaining access** is not an elegant way (but it 
indeed can solve the problem we encounter), because the polling frequency can't 
be determined easily, that is to say, polling with high frequency may introduce 
overhead and with slow frequency may lead to high memory usage by the inner 
writer `Vec<u8>`.
   
   > The nature of parquet is that an entire row group is buffered up and 
written in one shot, as data for different columns cannot be interleaved, so 
I'm not sure it is possible to do better than this
   
   For now, I doesn't know the details of the writer very well, but I guess the 
buffer logic can still be kept in the async writer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to