Weston Pace created ARROW-15784:
-----------------------------------

             Summary: [C++][Python] Parallel parquet file reading disabled with 
single file reads
                 Key: ARROW-15784
                 URL: https://issues.apache.org/jira/browse/ARROW-15784
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
    Affects Versions: 7.0.0
            Reporter: Weston Pace
            Assignee: Weston Pace
             Fix For: 7.0.1


There is a flag {{enable_parallel_column_conversion}} which was passed down 
from python to C++ when reading parquet datasets which controlled whether we 
would read columns in parallel.  This was allowed for single files but not for 
reading multiple files.  This was an old check to help prevent nested deadlock.

Nested deadlock is no longer an issue and the flag was mostly inert once we 
removed the synchronous scanner.

Unfortunately, when we removed the synchronous scanner we forgot to remove this 
flag and the result was that a single-file read ended up disabling parallelism.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to