zeroshade commented on PR #38581:
URL: https://github.com/apache/arrow/pull/38581#issuecomment-1808584411

   Sorry for the delay here, I've been away at a conference all last week. I've 
been catching up on my notifications.
   
   Reading through the code a bunch, it looks like the *intent* is that the 
passed in column index (originally named `col`, renamed to `leafColIdx` in this 
PR) should be the index of the *first* physical/parquet column index that will 
be used by the writer. For example, if you look inside the 
`ArrowColumnWriter.Write` method, it uses `colIdx + leafIdx` to tell the 
`BufferedRowGroupWriter` which physical/parquet column to write. This is also 
verified by looking at `FileWriter.WriteColumnChunked` which contains 
`fw.colIdx += acw.leafCount`, showing that we bump the column index we are 
constructing the column writer with by the number of leaf columns we found.
   
   So it looks like you've got the right interpretation here I believe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to