DuanWeiFan commented on issue #511:
URL: https://github.com/apache/arrow-go/issues/511#issuecomment-3357616720
After some investigation, I realize `RowGroupTotalCompressedBytes()` will
get populated as soon as one of the DataPage got flushed.
Wondering if that makes sense to add a `totalCompressedBytes` &
`totalBytesWritten` in FileWriter such that it can track the total number of
bytes for every row group instead of just the current row group.
The two new methods: `TotalBytesWritten()` & `TotalCompressedBytes()` will
then be able to report the total bytes written by the FileWriter.
```
type FileWriter struct {
...
totalCompressedBytes int64
totalBytesWritten int64
}
// NewRowGroup does what it says on the tin, creates a new row group in the
underlying file.
// Equivalent to `AppendRowGroup` on a file.Writer
func (fw *FileWriter) NewRowGroup() {
if fw.rgw != nil {
fw.totalCompressedBytes += fw.rgw.TotalCompressedBytes()
fw.totalBytesWritten += fw.rgw.TotalBytesWritten()
fw.rgw.Close()
}
fw.rgw = fw.wr.AppendRowGroup()
fw.colIdx = 0
}
// NewBufferedRowGroup starts a new memory Buffered Row Group to allow
writing columns / records
// without immediately flushing them to disk. This allows using
WriteBuffered to write records
// and decide where to break your row group based on the TotalBytesWritten
rather than on the max
// row group len. If using Records, this should be paired with
WriteBuffered, while
// Write will always write a new record as a row group in and of itself.
func (fw *FileWriter) NewBufferedRowGroup() {
if fw.rgw != nil {
fw.totalCompressedBytes += fw.rgw.TotalCompressedBytes()
fw.totalBytesWritten += fw.rgw.TotalBytesWritten()
fw.rgw.Close()
}
fw.rgw = fw.wr.AppendBufferedRowGroup()
fw.colIdx = 0
}
// TotalCompressedBytes returns the total number of bytes after compression
// that have been written to the file so far. It includes all the closed row
groups
// and the current row group.
func (fw *FileWriter) TotalCompressedBytes() int64 {
return fw.totalCompressedBytes + fw.RowGroupTotalCompressedBytes()
}
// TotalBytesWritten returns the total number of bytes
// that have been written to the file so far. It includes all the closed row
groups
// and the current row group.
func (fw *FileWriter) TotalBytesWritten() int64 {
return fw.totalBytesWritten + fw.RowGroupTotalBytesWritten()
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]