Alex Khakhlyuk created SPARK-55047:
--------------------------------------
Summary: [CONNECT] Add client-side limit for local relation size
Key: SPARK-55047
URL: https://issues.apache.org/jira/browse/SPARK-55047
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 4.1.1
Reporter: Alex Khakhlyuk
Currently, local relation sizes are limited using
{{spark.sql.session.localRelationSizeLimit}} conf (set to 3GB by default).
This limit is only checked on the server. The client can upload arbitrary large
relations, e.g. 100 GB, as artifacts, the server will store them, try to
materialize the local relation on the driver and only then throw an error if
the relation exceeds the limit.
This is bad for several reasons:
# The driver will need to store arbitrary amount of data and can run out of
memory or disk space causing driver failure.
# The client wastes a lot of time uploading the data to the server only to
fail later. This is bad UX.
This should be solved by enforcing the limit on the client.
If the client sees that the local relation it serializes exceeds the limit, it
will throw an {{AnalysisException}} with error class
{{LOCAL_RELATION_SIZE_LIMIT_EXCEEDED}} and sql state {{54000}} (program limit
exceeded).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]