zeroshade commented on code in PR #13120:
URL: https://github.com/apache/arrow/pull/13120#discussion_r873058055


##########
go/parquet/pqarrow/column_readers.go:
##########
@@ -216,12 +221,28 @@ func (sr *structReader) GetRepLevels() ([]int16, error) {
 }
 
 func (sr *structReader) LoadBatch(nrecords int64) error {
+       var (
+               np int = 1 // default to serial
+       )
+       // Load batches in parallel
+       // When reading structs with large numbers of columns, the serial load 
is very slow.
+       // This is especially true when reading Cloud Storage. Loading 
concurrently
+       // greatly improves performance.
+       if sr.props.Parallel {
+               np = len(sr.children)
+       }
+       g := new(errgroup.Group)
+       g.SetLimit(np)
        for _, rdr := range sr.children {
-               if err := rdr.LoadBatch(nrecords); err != nil {
-                       return err
-               }
+               func(r *ColumnReader) {
+                       g.Go(func() error {
+                               err := r.LoadBatch(nrecords)
+                               return err
+                       })
+               }(rdr)

Review Comment:
   you don't need the wrapping function, you can just do:
   `r := rdr` before the call to `g.Go` to have the same effect by creating an 
iteration-local variable. It's cleaner and cheaper than the enclosing function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to