[GitHub] [arrow] marsupialtail commented on a diff in pull request #13640: ARROW-14635: [Python][C++] add O_DIRECT support to writes

GitBox Tue, 19 Jul 2022 17:25:11 -0700


marsupialtail commented on code in PR #13640:
URL: https://github.com/apache/arrow/pull/13640#discussion_r925059114



##########
cpp/src/arrow/io/file.cc:
##########
@@ -378,6 +378,77 @@ Status FileOutputStream::Write(const void* data, int64_t 
length) {
 
 int FileOutputStream::file_descriptor() const { return impl_->fd(); }
 
+// ----------------------------------------------------------------------
+// DirectFileOutputStream, change the Open, Write and Close methods from 
FileOutputStream
+// Uses DirectIO for writes. Will only write out things in 4096 byte blocks. 
Buffers leftover bytes
+// in an internal data structure, which will be padded to 4096 bytes and 
flushed upon call to close.
+
+class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile {
+ public:
+  Status Open(const std::string& path, bool append) {
+    const bool truncate = !append;
+    return OpenWritable(path, truncate, append, true /* write_only */, true);
+  }
+  Status Open(int fd) { return OpenWritable(fd); }
+};
+
+DirectFileOutputStream::DirectFileOutputStream() { 
+  uintptr_t mask = (uintptr_t)(4095);
+  uint8_t *mem = static_cast<uint8_t *>(malloc(4096 + 4095));
+  cached_data = reinterpret_cast<uint8_t *>( 
reinterpret_cast<uintptr_t>(mem+4095) & ~(mask));

Review Comment:
   [directio.zip](https://github.com/apache/arrow/files/9145458/directio.zip)
   Please disregard the previous uploaded code. @westonpace helped me get these 
new benchmarks. Got fadvise working. You have to use fadvise after the write 
sys call. Unaligned/aligned doesn't make too much of a difference for the 
O_DIRECT since memcpy compared to NVMe is cheap. Doing nothing (nothing.cpp) is 
around same speed as O_DIRECT if you call sync at the end. fadvise is 
marginally slower (by about 15%).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] marsupialtail commented on a diff in pull request #13640: ARROW-14635: [Python][C++] add O_DIRECT support to writes

Reply via email to