corneliusroemer opened a new issue #1064:
URL: https://github.com/apache/arrow-rs/issues/1064


   **Describe the bug**
   In the parquet documentation, the following word choice (using *any*) is 
common:
   > Sets max statistics for **any** column
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L347
   
   I don't think this is the clearest, most unambiguous word to use here.
   
   I think that the meaning intended is: "Sets max statistics for _all_ columns 
(or _every_ column)"
   
   Any is sloppy, because it could mean "more than one" and does not guarantee 
"all" when "all" is guaranteed.
   
   This usage of _any_ appears a couple of times in parquet. I think it should 
be edited for all occurences.
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L311-L320
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L326
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L332
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L341
   
   
https://github.com/apache/arrow-rs/blob/99b7d01103495607932343146c973b6fba0eb8d5/parquet/src/file/properties.rs#L347
   
   I noticed this confusing wording in the docs for csv2parquet CLI and opened 
an issue there: https://github.com/domoritz/csv2parquet/issues/42
   
   @domoritz then pointed me here as he just copied from upstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to