alamb commented on code in PR #7714:
URL: https://github.com/apache/arrow-rs/pull/7714#discussion_r2158897830


##########
parquet/src/file/writer.rs:
##########
@@ -390,13 +390,33 @@ impl<W: Write + Send> SerializedFileWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip

Review Comment:
   Likewise here since it is not possible to write to an immutable buffer we 
can probably skip this warning



##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -320,14 +320,21 @@ impl<W: Write + Send> ArrowWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip

Review Comment:
   Given this is a read only reference (immutable) the warning here is probably 
not necessary. 



##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -320,14 +320,21 @@ impl<W: Write + Send> ArrowWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
     pub fn inner(&self) -> &W {
         self.writer.inner()
     }
 
     /// Returns a mutable reference to the underlying writer.
     ///
-    /// It is inadvisable to directly write to the underlying writer, doing so
-    /// will likely result in a corrupt parquet file
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
     pub fn inner_mut(&mut self) -> &mut W {
         self.writer.inner_mut()
     }

Review Comment:
   I wonder should we add `write_all` to the ArrowWriter too?



##########
parquet/src/file/writer.rs:
##########
@@ -390,13 +390,33 @@ impl<W: Write + Send> SerializedFileWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
+    ///
+    /// If you want to write safely to the underlying writer, use 
[`Self::write_all`].
     pub fn inner(&self) -> &W {
         self.buf.inner()
     }
 
+    /// Writes the given buf bytes to the internal buffer.

Review Comment:
   It might be nice to add a note here about why you might do this
   
   For example, something like
   
   ```suggestion
       /// Writes the given buf bytes to the internal buffer.
       ///
       /// This can be used to write raw data to an in-progress parquet file, 
for 
       /// example custom index structures or other payloads. Other parquet 
readers 
       /// will skip this data when reading the files.
   ```



##########
parquet/src/file/writer.rs:
##########
@@ -390,13 +390,33 @@ impl<W: Write + Send> SerializedFileWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
+    ///
+    /// If you want to write safely to the underlying writer, use 
[`Self::write_all`].
     pub fn inner(&self) -> &W {
         self.buf.inner()
     }
 
+    /// Writes the given buf bytes to the internal buffer.
+    ///
+    /// It's safe to use this method to write data to the underlying writer,
+    /// because it will ensure that the buffering and byte‐counting layers are 
used.
+    pub fn write_all(&mut self, buf: &[u8]) -> std::io::Result<()> {
+        self.buf.write_all(buf)
+    }
+
     /// Returns a mutable reference to the underlying writer.
     ///
-    /// It is inadvisable to directly write to the underlying writer.
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
+    ///
+    /// If you want to write safely to the underlying writer, use 
[`Self::write_all`].

Review Comment:
   ❤️ 



##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -320,14 +320,21 @@ impl<W: Write + Send> ArrowWriter<W> {
     }
 
     /// Returns a reference to the underlying writer.
+    ///
+    /// **Warning**: if you write directly to this writer, you will skip
+    /// the `TrackedWrite` buffering and byte‐counting layers. That’ll cause
+    /// the file footer’s recorded offsets and sizes to diverge from reality,
+    /// resulting in an unreadable or corrupted Parquet file.
     pub fn inner(&self) -> &W {
         self.writer.inner()
     }
 
     /// Returns a mutable reference to the underlying writer.
     ///
-    /// It is inadvisable to directly write to the underlying writer, doing so
-    /// will likely result in a corrupt parquet file
+    /// **Warning**: if you write directly to this writer, you will skip

Review Comment:
   Thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to