zeroshade commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1808584411
Sorry for the delay here, I've been away at a conference all last week. I've been catching up on my notifications. Reading through the code a bunch, it looks like the *intent* is that the passed in column index (originally named `col`, renamed to `leafColIdx` in this PR) should be the index of the *first* physical/parquet column index that will be used by the writer. For example, if you look inside the `ArrowColumnWriter.Write` method, it uses `colIdx + leafIdx` to tell the `BufferedRowGroupWriter` which physical/parquet column to write. This is also verified by looking at `FileWriter.WriteColumnChunked` which contains `fw.colIdx += acw.leafCount`, showing that we bump the column index we are constructing the column writer with by the number of leaf columns we found. So it looks like you've got the right interpretation here I believe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
