marsupialtail commented on code in PR #13640:
URL: https://github.com/apache/arrow/pull/13640#discussion_r959864363


##########
cpp/src/arrow/io/file.cc:
##########
@@ -378,6 +378,101 @@ Status FileOutputStream::Write(const void* data, int64_t 
length) {
 
 int FileOutputStream::file_descriptor() const { return impl_->fd(); }
 
+// ----------------------------------------------------------------------
+// DirectFileOutputStream, change the Open, Write and Close methods from 
FileOutputStream
+// Uses DirectIO for writes. Will only write out things in 4096 byte blocks. 
Buffers
+// leftover bytes in an internal data structure, which will be padded to 4096 
bytes and
+// flushed upon call to close.
+
+class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile {
+ public:
+  Status Open(const std::string& path, bool append) {
+    const bool truncate = !append;
+    return OpenWritable(path, truncate, append, true /* write_only */, true);
+  }
+  Status Open(int fd) { return OpenWritable(fd); }
+};
+
+DirectFileOutputStream::DirectFileOutputStream() {
+  uintptr_t mask = (uintptr_t)(4095);

Review Comment:
   Spent a while trying not to hard code 4096 actually at the way beginning. I 
think hard coding it to 4096 is the best solution. Firstly this is not the page 
size. This is the physical disk sector size. Getting this size in C is a bit 
tricky. The best way is to do something like this 
   ```
   #include <linux/fs.h>
   #include <iostream>
   #include <stdio.h>
   #include <fcntl.h>
   #include <sys/ioctl.h>
   #include <errno.h>
   #include <string.h>
   int main(int argc, char **argv)
   {
     long logicalsectsize = 0;
     int retval;
     int fd = open("/dev/nvme0n1p2", O_RDONLY);
     std::cout << fd << std::endl;
     std::cout << strerror(errno) << std::endl;
     /*retval=ioctl(fd, BLKSSZGET, &logicalsectsize);*/
     if (ioctl(fd, BLKSSZGET, &logicalsectsize) < 0)
       {
         printf("Can't get logical sector size. Return code: %d\n",retval);
         logicalsectsize = 0;
       }
     std::cout << logicalsectsize << std::endl;
   
   }
   ```
   However, this could run into permissions issues and on my machine at least 
can only be run under sudo. 
   
   4096 I think is a good hard coded number. Current disk sector sizes are 
either 512 or 4096, with more and more disks expected to become 4096 over the 
next decade. Please refer to this: 
https://en.wikipedia.org/wiki/Disk_sector#:~:text=In%20computer%20disk%20storage%2C%20a,%2DROMs%20and%20DVD%2DROMs.
   
   If it's 512, setting the number to 4096 will still work. It just needs to be 
a multiple of the disk sector size.
   
   There has been no discussion at all in making the disk sector size even 
larger than 4096, so I think the choice is safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to