tschaub commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1809281102
@zeroshade - I agree that the implementation of the `pqarrow.NewArrowColumnWriter` function expects the last argument to be the physical/Parquet column index. What was confusing to me and caused uncertainty was that the `pqarrow` package is mostly about working with Arrow, and the name of the function `NewArrowColumnWriter` suggests this is about Arrow columns. Then I looked at the tests for confirmation and saw that they use the Arrow table column index in calling this function: https://github.com/apache/arrow/blob/bff5fb95ac987354ecd7d69466e11b9f33a67d08/go/parquet/pqarrow/encode_arrow_test.go#L148-L152 So while I agree that the changes here improve the name of that last argument and make it easier to get the leaf index with the newly exported `arrowColumnWriter.LeafCount()` method, I'm not sure this makes for the most intuitive API. Since the function is called with an Arrow column chunk and the number of Arrow rows, wouldn't it make sense to call it with the Arrow column index as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
