tschaub commented on code in PR #38507:
URL: https://github.com/apache/arrow/pull/38507#discussion_r1376611345


##########
go/parquet/pqarrow/file_writer.go:
##########
@@ -134,6 +134,15 @@ func (fw *FileWriter) RowGroupTotalBytesWritten() int64 {
        return 0
 }
 
+// RowGroupNumRows returns the number of rows written to the current row group.
+// Returns an error if they are unequal between columns that have been written 
so far.
+func (fw *FileWriter) RowGroupNumRows() (int, error) {
+       if fw.rgw != nil {
+               return fw.rgw.NumRows()
+       }
+       return 0, nil
+}
+

Review Comment:
   Sounds good, @zeroshade.  Adding tests for a `NumRows` method revealed that 
the `file.Writer` doesn't increment the number of rows written when appending a 
new row group writer.
   
   I've added a commit that updates the `file.Writer` so it tracks the number 
of rows written across multiple row groups and adds a `NumRows` method on 
`pqarrow.FileWriter` that accesses this.
   
   Let me know if it would be preferable to ticket the `file.Writer` num rows 
issue separately and create a dedicated fix for that.  The assertion added here 
fails on the default branch: 
https://github.com/apache/arrow/pull/38507/commits/c30d6dd4ae443f86f5835fae62f163ffa373cdf1#diff-f4e1b3867fa5f621a41484668fbce7194ecdb289f68fb6ed0e112e8e3a00bde8R100



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to