asfimport opened a new issue, #30179:
URL: https://github.com/apache/arrow/issues/30179

   The dataset writer now correctly applies backpressure.  However, that 
backpressure is only applied when the write calls slow down.  This only happens 
when the OS disk cache fills up.
   
   However, filling up the OS disk cache is undesirable.  It will cause all 
running processes to get swapped (assuming the system has any swap configured) 
and will make the system unusable for anything else.
   
   This typically has no actual benefit to the dataset write.  The marginal 
performance boost provided by the extra RAM is often not worth the cost.
   
   One way to do this would be to use direct I/O (although that comes with a 
plethora of warnings).  Another way might be to flag the output was WONTNEED 
but I don't know for sure if this works (the OS might still cache it so that it 
can satisfy the write call quickly).  Another way might be to somehow track how 
much disk cache is being used for writes but that would get complex.  I'm sure 
there are other ways I'm just not aware of yet.
   
   **Reporter**: [Weston 
Pace](https://issues.apache.org/jira/browse/ARROW-14635) / @westonpace
   **Assignee**: [Ziheng 
Wang](https://issues.apache.org/jira/browse/ARROW-14635) / @marsupialtail
   #### Related issues:
   - [[Python][C++] O_DIRECT write support 
](https://github.com/apache/arrow/issues/32418) (supercedes)
   - [[C++][R]Opening a multi-file dataset and writing a re-partitioned version 
of it fails](https://github.com/apache/arrow/issues/18944) (is depended upon by)
   #### PRs and other links:
   - [GitHub Pull Request #13640](https://github.com/apache/arrow/pull/13640)
   - [GitHub Pull Request #13662](https://github.com/apache/arrow/pull/13662)
   
   <sub>**Note**: *This issue was originally created as 
[ARROW-14635](https://issues.apache.org/jira/browse/ARROW-14635). Please see 
the [migration documentation](https://github.com/apache/arrow/issues/14542) for 
further details.*</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to