marsupialtail commented on code in PR #13640:
URL: https://github.com/apache/arrow/pull/13640#discussion_r924779403
##########
cpp/src/arrow/io/file.cc:
##########
@@ -378,6 +378,77 @@ Status FileOutputStream::Write(const void* data, int64_t
length) {
int FileOutputStream::file_descriptor() const { return impl_->fd(); }
+// ----------------------------------------------------------------------
+// DirectFileOutputStream, change the Open, Write and Close methods from
FileOutputStream
+// Uses DirectIO for writes. Will only write out things in 4096 byte blocks.
Buffers leftover bytes
+// in an internal data structure, which will be padded to 4096 bytes and
flushed upon call to close.
+
+class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile {
+ public:
+ Status Open(const std::string& path, bool append) {
+ const bool truncate = !append;
+ return OpenWritable(path, truncate, append, true /* write_only */, true);
+ }
+ Status Open(int fd) { return OpenWritable(fd); }
+};
+
+DirectFileOutputStream::DirectFileOutputStream() {
+ uintptr_t mask = (uintptr_t)(4095);
+ uint8_t *mem = static_cast<uint8_t *>(malloc(4096 + 4095));
+ cached_data = reinterpret_cast<uint8_t *>(
reinterpret_cast<uintptr_t>(mem+4095) & ~(mask));
Review Comment:
[directio.zip](https://github.com/apache/arrow/files/9143325/directio.zip)
Here is an experiment with three ways of writing: O_DIRECT, O_SYNC +
fadvise, and normal write. I run the experiments on a 10GB write to NVME SSD on
AWS i3.2xlarge. With O_DIRECT no page cache mem increase but the other two ways
of writing caused 10GB page cache usage. O_SYNC + fadvise had no visible
difference with normal write in terms of page cache usage. Perhaps I am not
using the fadvise API properly? If so please let me know.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]