Felipe created SPARK-56871:
------------------------------

             Summary: Split parquet reads more granularly
                 Key: SPARK-56871
                 URL: https://issues.apache.org/jira/browse/SPARK-56871
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.1.1
            Reporter: Felipe


Spark can only split the reads by row group, creating scalability issues:

[Is Spark limited to split the Parquet read granularity by Row Group level 
only? · Issue #55747 · 
apache/spark|https://github.com/apache/spark/issues/55747]

Workaround shared in that issue: [Row count limit for each row group · Issue 
#3235 · 
apache/parquet-java|https://github.com/apache/parquet-java/issues/3235], but 
this is a write-side workaround



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to