marsupialtail commented on code in PR #13640:
URL: https://github.com/apache/arrow/pull/13640#discussion_r985396712


##########
cpp/src/arrow/io/file.h:
##########
@@ -80,6 +80,56 @@ class ARROW_EXPORT FileOutputStream : public OutputStream {
   std::unique_ptr<FileOutputStreamImpl> impl_;
 };
 
+#if defined(__linux__)
+/// \brief An operating system file open in write-only and O_DIRECT mode, only 
works on
+/// Linux.
+class ARROW_EXPORT DirectFileOutputStream : public OutputStream {
+ public:
+  ~DirectFileOutputStream() override;
+
+  /// \brief Open a local file for writing, truncating any existing file
+  /// \param[in] path with UTF8 encoding
+  ///
+  /// When opening a new file, any existing file with the indicated path is
+  /// truncated to 0 bytes, deleting any existing data
+  static Result<std::shared_ptr<DirectFileOutputStream>> Open(const 
std::string& path);
+
+  /// \brief Open a file descriptor for writing.  The underlying file isn't
+  /// truncated.
+  /// \param[in] fd file descriptor
+  /// \param[in] sector_size the physical disk sector size hosting this fd
+  /// \return an open FileOutputStream
+  ///
+  /// The file descriptor becomes owned by the OutputStream, and will be closed
+  /// on Close() or destruction.
+  static Result<std::shared_ptr<DirectFileOutputStream>> Open(int fd,
+                                                              int64_t 
sector_size);
+
+  // OutputStream interface
+  Status Close() override;
+  bool closed() const override;
+  Result<int64_t> Tell() const override;
+
+  // Write bytes to the stream. Thread-safe
+  Status Write(const void* data, int64_t nbytes) override;
+  /// \cond FALSE
+  using Writable::Write;
+  /// \endcond
+
+  int file_descriptor() const;
+
+ private:
+  explicit DirectFileOutputStream(int64_t sector_size);
+
+  class ARROW_NO_EXPORT DirectFileOutputStreamImpl;
+  std::unique_ptr<DirectFileOutputStreamImpl> impl_;
+  int64_t sector_size_ = 0;
+  std::vector<uint8_t> cached_data_;
+  uint8_t* aligned_cached_data_;
+  int64_t cached_length_ = 0;

Review Comment:
   What's the problem with having these attributes here instead of inside the 
private impl class?
   
   I don't think it makes sense to have them inside the private impl class, 
because they are used by the DirectFileOutputStream class methods not the impl_ 
class methods. The impl_ class methods just override basic OSFile functionality 
and should not have to know about cached_data etc. 



##########
cpp/src/arrow/io/file.cc:
##########
@@ -378,6 +378,106 @@ Status FileOutputStream::Write(const void* data, int64_t 
length) {
 
 int FileOutputStream::file_descriptor() const { return impl_->fd(); }
 
+// ----------------------------------------------------------------------
+// DirectFileOutputStream, change the Open, Write and Close methods from 
FileOutputStream
+// Uses DirectIO for writes. Will only write out things in disk-sector-size 
byte blocks.
+// Buffers leftover bytes in an internal data structure, which will be padded 
to
+// disk-sector-size bytes and written upon call to close.
+
+#if defined(__linux__)
+class DirectFileOutputStream::DirectFileOutputStreamImpl : public OSFile {
+ public:
+  Status Open(const std::string& path) {
+    return OpenWritable(path, true, false, true /* write_only */, true);
+  }
+  Status Open(int fd) { return OpenWritable(fd); }
+};
+
+DirectFileOutputStream::DirectFileOutputStream(int64_t sector_size) {
+  sector_size_ = sector_size;
+  uintptr_t mask = (uintptr_t)(sector_size - 1);

Review Comment:
   I can check if the sector_size is a power of two. But a constructor can't 
return a Result right, what's the right way to do this in Arrow? I don't think 
just adding this code:
   ```
   if !(sector_size != 0 && (sector_size & (sector_size-1)) == 0)
     {
       return Status::IOError("Length must be non-negative");
     }
   ```
   is going to work here because return Status shouldn't be allowed in 
constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to