lidavidm commented on code in PR #197:
URL: https://github.com/apache/arrow-go/pull/197#discussion_r1855658535
##########
parquet/pqarrow/properties.go:
##########
@@ -165,6 +165,11 @@ type ArrowReadProperties struct {
Parallel bool
// BatchSize is the size used for calls to NextBatch when reading whole
columns
BatchSize int64
+ // Setting ForceLarge to true will force the reader to use
LargeString/LargeBinary
+ // for string and binary columns respectively, instead of the default
variants. This
+ // can be necessary if you know that there are columns which contain
more than 2GB of
+ // data, which would prevent use of int32 offsets.
+ ForceLarge bool
Review Comment:
I agree with the original poster that it's rather weird that we can generate
Parquet files that we can't otherwise read. However I also agree with Antoine
that it might be good to make this option per-column if possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]