nevi-me commented on pull request #511:
URL: https://github.com/apache/arrow-rs/pull/511#issuecomment-871751724


   Hi @hohav, I've looked at all 3 versions, and I see that while this PR fixes 
the repetition issue, it doesn't address your specific issue because of the 
writer bug that I indicated in #385.
   
   I see that you also tested against
   
   ```toml
   arrow = { git = "https://github.com/apache/arrow-rs.git"; }
   parquet = { git = "https://github.com/apache/arrow-rs.git"; }
   ```
   
   I might have not been clear enough, I had meant for you to test against
   
   ```toml
   arrow = { git = "https://github.com/nevi-me/arrow-rs.git";, branch = 
"parquet-fix-levels" }
   parquet = { git = "https://github.com/nevi-me/arrow-rs.git";, branch = 
"parquet-fix-levels" }
   ```
   
   It's not an issue though, as I've cloned your repro repo, and installed 
parquet-mr tools to check what you're seeing.
   
   The solution is two-fold, this PR, and then computing stats in #512 so that 
we avoid the column writer computing these stats.
   
   I don't have enough bandwidth to drive #512, would you be able to help with 
it? I think the main tasks would be:
   
   - checking that the stats for lists make sense (they should, but we probably 
need tests)
   - adding stats for fixed-len binary where it makes sense.
   
   It would be a good exercise in digging into the arrow compute code.
   
   CC @alamb @jorgecarleitao for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to