tschaub commented on PR #38581:
URL: https://github.com/apache/arrow/pull/38581#issuecomment-1809281102

   @zeroshade - I agree that the implementation of the 
`pqarrow.NewArrowColumnWriter` function expects the last argument to be the 
physical/Parquet column index.
   
   What was confusing to me and caused uncertainty was that the `pqarrow` 
package is mostly about working with Arrow, and the name of the function 
`NewArrowColumnWriter` suggests this is about Arrow columns.  Then I looked at 
the tests for confirmation and saw that they use the Arrow table column index 
in calling this function: 
https://github.com/apache/arrow/blob/bff5fb95ac987354ecd7d69466e11b9f33a67d08/go/parquet/pqarrow/encode_arrow_test.go#L148-L152
   
   So while I agree that the changes here improve the name of that last 
argument and make it easier to get the leaf index with the newly exported 
`arrowColumnWriter.LeafCount()` method, I'm not sure this makes for the most 
intuitive API.  Since the function is called with an Arrow column chunk and the 
number of Arrow rows, wouldn't it make sense to call it with the Arrow column 
index as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to