HuaHuaY opened a new issue, #47664:
URL: https://github.com/apache/arrow/issues/47664
### Describe the bug, including details regarding any error messages,
version, and platform.
`nums_rows()` method is inherited from `ParquetFileWriter`. It seems that
the method is designed to count the number of rows of RowGroups which are
closed.
```cpp
/// Number of rows in the yet started RowGroups.
///
/// Changes on the addition of a new RowGroup.
int64_t num_rows() const;
```
But in the implementation of `class FileSerializer : public
ParquetFileWriter::Contents`, the `num_rows_` variable is only modified in
`Close` method. It means that `num_rows()` method always returns `0` before
`Close` is called.
And after `Close` of `ParquetFileWriter` is called we can no longer call
`nums_rows()` method, because `contents_` is reset in `Close` and we will get a
null pointer exception if we call `nums_rows()`.
So, in the current implementation of `FileSerializer`, `num_rows()` will
only return `0` or throw exception.
```cpp
int ParquetFileWriter::num_columns() const { return
contents_->num_columns(); }
void ParquetFileWriter::Close() {
if (contents_) {
contents_->Close();
file_metadata_ = contents_->metadata();
contents_.reset();
}
}
// The following is the code of `FileSerializer`.
int64_t num_rows() const override { return num_rows_; }
void Close() override {
if (is_open_) {
if (row_group_writer_) {
num_rows_ += row_group_writer_->num_rows(); // <- num_rows is
only modified here and can never be read.
...
}
...
}
RowGroupWriter* AppendRowGroup(bool buffered_row_group) {
if (row_group_writer_) {
// <- A possible missing `num_rows_ += row_group_writer_->num_rows()`
here.
row_group_writer_->Close();
}
...
}
```
Should we mark the virtual method in `ParquetFileWriter` deprecated and
remove the method in next release? I guess the method which is used to count
the row number during writing row groups is useless and our default
implementation always returns 0 now. Otherwise we can fix the issue by
following the comments above.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]