wgtmac commented on code in PR #33897:
URL: https://github.com/apache/arrow/pull/33897#discussion_r1089750172
##########
cpp/src/parquet/column_writer.h:
##########
@@ -103,6 +103,8 @@ class PARQUET_EXPORT PageWriter {
// Return the number of uncompressed bytes written (including header size)
virtual int64_t WriteDictionaryPage(const DictionaryPage& page) = 0;
+ virtual int64_t total_compressed_bytes_written() const = 0;
Review Comment:
Add a comment?
##########
cpp/src/parquet/column_writer.h:
##########
@@ -136,8 +138,15 @@ class PARQUET_EXPORT ColumnWriter {
/// \brief The total number of bytes written as serialized data and
/// dictionary pages to the ColumnChunk so far
+ /// These bytes are uncompressed bytes.
virtual int64_t total_bytes_written() const = 0;
+ /// \brief The total number of bytes written as serialized data and
+ /// dictionary pages to the ColumnChunk so far.
+ /// If the column is uncompressed, the value would be equal to
Review Comment:
Is this true? Will total_compressed_bytes_written() be greater than
total_bytes_written() because of page headers and column chunk metadata?
##########
cpp/src/parquet/column_writer.h:
##########
@@ -136,8 +138,15 @@ class PARQUET_EXPORT ColumnWriter {
/// \brief The total number of bytes written as serialized data and
Review Comment:
```suggestion
/// \brief The total number of uncompressed bytes written as serialized
data and
```
##########
cpp/src/parquet/column_writer.h:
##########
@@ -136,8 +138,15 @@ class PARQUET_EXPORT ColumnWriter {
/// \brief The total number of bytes written as serialized data and
Review Comment:
Is it enough?
##########
cpp/src/parquet/file_writer.h:
##########
@@ -90,8 +92,12 @@ class PARQUET_EXPORT RowGroupWriter {
*/
int64_t num_rows() const;
+ /// \brief total uncompressed bytes written by the page writer
int64_t total_bytes_written() const;
+ /// \brief total bytes still compressed but not written
Review Comment:
Should we explain it can only exist in the buffered row group?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]