zeroshade commented on code in PR #13120:
URL: https://github.com/apache/arrow/pull/13120#discussion_r873058184


##########
go/parquet/pqarrow/file_reader.go:
##########
@@ -202,14 +206,33 @@ func (fr *FileReader) GetFieldReaders(ctx 
context.Context, colIndices, rowGroups
 
        out := make([]*ColumnReader, len(fieldIndices))
        outFields := make([]arrow.Field, len(fieldIndices))
-       for idx, fidx := range fieldIndices {
-               rdr, err := fr.GetFieldReader(ctx, fidx, includedLeaves, 
rowGroups)
-               if err != nil {
-                       return nil, nil, err
-               }
 
-               outFields[idx] = *rdr.Field()
-               out[idx] = rdr
+       // Load batches in parallel
+       // When reading structs with large numbers of columns, the serial load 
is very slow.
+       // This is especially true when reading Cloud Storage. Loading 
concurrently
+       // greatly improves performance.
+       // GetFieldReader causes read operations, when issued serially on large 
numbers of columns,
+       // this is super time consuming. Get field readers concurrently.
+       if fr.Props.Parallel {
+               np = len(fieldIndices)
+       }
+       g := new(errgroup.Group)
+       g.SetLimit(np)
+       for idx, fidx := range fieldIndices {
+               func(idx, fidx int) {
+                       g.Go(func() error {
+                               rdr, err := fr.GetFieldReader(ctx, fidx, 
includedLeaves, rowGroups)
+                               if err != nil {
+                                       return err
+                               }
+                               outFields[idx] = *rdr.Field()
+                               out[idx] = rdr
+                               return nil
+                       })
+               }(idx, fidx)

Review Comment:
   same as above, rather than enclosing it with a function to achieve the 
safety, you can just do:
   `idx, fidx := idx, fidx` just before you call `g.Go` to create 
iteration-local copies of the variables. Personally i find it cleaner than 
enclosing the whole thing in a function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to