Re: [PR] feat(parquet/pqarrow): Add ForceLarge option [arrow-go]

via GitHub Wed, 27 Nov 2024 13:26:20 -0800


joellubi commented on code in PR #197:
URL: https://github.com/apache/arrow-go/pull/197#discussion_r1861252624



##########
parquet/pqarrow/properties.go:
##########
@@ -165,6 +165,11 @@ type ArrowReadProperties struct {
        Parallel bool
        // BatchSize is the size used for calls to NextBatch when reading whole 
columns
        BatchSize int64
+       // Setting ForceLarge to true will force the reader to use 
LargeString/LargeBinary
+       // for string and binary columns respectively, instead of the default 
variants. This
+       // can be necessary if you know that there are columns which contain 
more than 2GB of
+       // data, which would prevent use of int32 offsets.
+       ForceLarge bool

Review Comment:
   I also agree that automatically changing the type could be confusing. 
Regardless of the approach used to convert types, the behavior of automatically 
reducing the batch size instead of exceeding the max offset of a variable width 
type would be very nice IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(parquet/pqarrow): Add ForceLarge option [arrow-go]

Reply via email to