[
https://issues.apache.org/jira/browse/SPARK-55047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herman van Hövell resolved SPARK-55047.
---------------------------------------
Fix Version/s: 4.2.0
Resolution: Fixed
Issue resolved by pull request 53815
[https://github.com/apache/spark/pull/53815]
> [CONNECT] Add client-side limit for local relation size
> -------------------------------------------------------
>
> Key: SPARK-55047
> URL: https://issues.apache.org/jira/browse/SPARK-55047
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 4.1.1
> Reporter: Alex Khakhlyuk
> Assignee: Alex Khakhlyuk
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> Currently, local relation sizes are limited using
> {{spark.sql.session.localRelationSizeLimit}} conf (set to 3GB by default).
> This limit is only checked on the server. The client can upload arbitrary
> large relations, e.g. 100 GB, as artifacts, the server will store them, try
> to materialize the local relation on the driver and only then throw an error
> if the relation exceeds the limit.
> This is bad for several reasons:
> # The driver will need to store arbitrary amount of data and can run out of
> memory or disk space causing driver failure.
> # The client wastes a lot of time uploading the data to the server only to
> fail later. This is bad UX.
> This should be solved by enforcing the limit on the client.
> If the client sees that the local relation it serializes exceeds the limit,
> it will throw an {{AnalysisException}} with error class
> {{LOCAL_RELATION_SIZE_LIMIT_EXCEEDED}} and sql state {{54000}} (program limit
> exceeded).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]