mapleFU commented on PR #38581:
URL: https://github.com/apache/arrow/pull/38581#issuecomment-1809504173
> So while I agree that the changes here improve the name of that last
argument and make it easier to get the leaf index with the newly exported
arrowColumnWriter.LeafCount() method, I'm not sure this makes for the most
intuitive API. Since the function is called with an Arrow column chunk and the
number of Arrow rows, wouldn't it make sense to call it with the Arrow column
index as well (as done in the tests above)?
This is not the most intuitive API. Generally `parquet/arrow` module need to
convert arrow record batch to a parquet leaves, this need to be done in
`parquet/arrow`
```
func (fw *FileWriter) WriteColumnChunked(data *arrow.Chunked, offset, size
int64) error {
acw, err := NewArrowColumnWriter(data, offset, size, fw.manifest,
fw.rgw, fw.colIdx)
if err != nil {
return err
}
fw.colIdx += acw.leafCount
return acw.Write(fw.ctx)
}
```
This API is much more easily, it's used in `WriteTable`. Also, C++ has a
`WriteRecordBatch`, we use it as in-house parquet writer. I think it's easy to
porting and use it.
`ArrowColumnWriter` is neccessary for writing parquet, but I think it's a
bit hack to using it as a export API. User need to always maintaining the
offsets, array types etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]