tschaub commented on code in PR #38507:
URL: https://github.com/apache/arrow/pull/38507#discussion_r1376611345
##########
go/parquet/pqarrow/file_writer.go:
##########
@@ -134,6 +134,15 @@ func (fw *FileWriter) RowGroupTotalBytesWritten() int64 {
return 0
}
+// RowGroupNumRows returns the number of rows written to the current row group.
+// Returns an error if they are unequal between columns that have been written
so far.
+func (fw *FileWriter) RowGroupNumRows() (int, error) {
+ if fw.rgw != nil {
+ return fw.rgw.NumRows()
+ }
+ return 0, nil
+}
+
Review Comment:
Sounds good, @zeroshade. Adding tests for a `NumRows` method revealed that
the `file.Writer` doesn't increment the number of rows written when appending a
new row group writer.
~I've added a commit that updates the `file.Writer` so it tracks the number
of rows written across multiple row groups and adds a `NumRows` method on
`pqarrow.FileWriter` that accesses this.~
~Let me know if it would be preferable to ticket the `file.Writer` num rows
issue separately and create a dedicated fix for that. The assertion added here
fails on the default branch:
https://github.com/apache/arrow/pull/38507/commits/c30d6dd4ae443f86f5835fae62f163ffa373cdf1#diff-f4e1b3867fa5f621a41484668fbce7194ecdb289f68fb6ed0e112e8e3a00bde8R100~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]