Felipe created SPARK-56871:
------------------------------
Summary: Split parquet reads more granularly
Key: SPARK-56871
URL: https://issues.apache.org/jira/browse/SPARK-56871
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.1.1
Reporter: Felipe
Spark can only split the reads by row group, creating scalability issues:
[Is Spark limited to split the Parquet read granularity by Row Group level
only? · Issue #55747 ·
apache/spark|https://github.com/apache/spark/issues/55747]
Workaround shared in that issue: [Row count limit for each row group · Issue
#3235 ·
apache/parquet-java|https://github.com/apache/parquet-java/issues/3235], but
this is a write-side workaround
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]