nevi-me commented on pull request #511:
URL: https://github.com/apache/arrow-rs/pull/511#issuecomment-871751724
Hi @hohav, I've looked at all 3 versions, and I see that while this PR fixes
the repetition issue, it doesn't address your specific issue because of the
writer bug that I indicated in #385.
I see that you also tested against
```toml
arrow = { git = "https://github.com/apache/arrow-rs.git" }
parquet = { git = "https://github.com/apache/arrow-rs.git" }
```
I might have not been clear enough, I had meant for you to test against
```toml
arrow = { git = "https://github.com/nevi-me/arrow-rs.git", branch =
"parquet-fix-levels" }
parquet = { git = "https://github.com/nevi-me/arrow-rs.git", branch =
"parquet-fix-levels" }
```
It's not an issue though, as I've cloned your repro repo, and installed
parquet-mr tools to check what you're seeing.
The solution is two-fold, this PR, and then computing stats in #512 so that
we avoid the column writer computing these stats.
I don't have enough bandwidth to drive #512, would you be able to help with
it? I think the main tasks would be:
- checking that the stats for lists make sense (they should, but we probably
need tests)
- adding stats for fixed-len binary where it makes sense.
It would be a good exercise in digging into the arrow compute code.
CC @alamb @jorgecarleitao for a review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]