matthewmcnew opened a new issue, #39870: URL: https://github.com/apache/arrow/issues/39870
### Describe the bug, including details regarding any error messages, version, and platform. There does not appear to be an accurate way to identify or estimate the size of the current row group with `pqarrow.FileWriter`. `RowGroupTotalCompressedBytes()`provides the total bytes from [created data pages](https://github.com/apache/arrow/blob/main/go/parquet/file/column_writer.go#L334) but, when the [dictionary page size limit is reached ](https://github.com/apache/arrow/blob/main/go/parquet/file/column_writer_types.gen.go.tmpl#L240-L242) the buffered data pages are flushed and the [total size is reset to "0"](https://github.com/apache/arrow/blob/main/go/parquet/file/column_writer.go#L400). This means the RowGroupTotalCompressedBytes will only provide the size of pages created after the dictionary page size was reached. Ideally the size the TotalCompressedBytes size should include all created data pages. `RowGroupTotalBytesWritten()` will provide the total bytes of DataPages when [they are written](https://github.com/apache/arrow/blob/main/go/parquet/file/column_writer.go#L461) but, not if the the page is buffered due to the [dictionary page still being created](https://github.com/apache/arrow/blob/main/go/parquet/file/column_writer.go#L330). This causes the `RowGroupTotalBytesWritten` to inaccurately provide a "0" bytes estimate until the dictionary page size limit is reached. Perhaps related to: https://github.com/apache/arrow/issues/39789. ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
