nevi-me commented on pull request #511: URL: https://github.com/apache/arrow-rs/pull/511#issuecomment-871751724
Hi @hohav, I've looked at all 3 versions, and I see that while this PR fixes the repetition issue, it doesn't address your specific issue because of the writer bug that I indicated in #385. I see that you also tested against ```toml arrow = { git = "https://github.com/apache/arrow-rs.git" } parquet = { git = "https://github.com/apache/arrow-rs.git" } ``` I might have not been clear enough, I had meant for you to test against ```toml arrow = { git = "https://github.com/nevi-me/arrow-rs.git", branch = "parquet-fix-levels" } parquet = { git = "https://github.com/nevi-me/arrow-rs.git", branch = "parquet-fix-levels" } ``` It's not an issue though, as I've cloned your repro repo, and installed parquet-mr tools to check what you're seeing. The solution is two-fold, this PR, and then computing stats in #512 so that we avoid the column writer computing these stats. I don't have enough bandwidth to drive #512, would you be able to help with it? I think the main tasks would be: - checking that the stats for lists make sense (they should, but we probably need tests) - adding stats for fixed-len binary where it makes sense. It would be a good exercise in digging into the arrow compute code. CC @alamb @jorgecarleitao for a review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org