Venkata Sai Akhil Gudesa created SPARK-44657:
------------------------------------------------
Summary: Incorrect limit handling and config parsing in Arrow
collect
Key: SPARK-44657
URL: https://issues.apache.org/jira/browse/SPARK-44657
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 3.4.1, 3.4.0, 3.4.2, 3.5.0
Reporter: Venkata Sai Akhil Gudesa
In the arrow writer
[code|https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L154-L163]
, the conditions don’t seem to hold what the documentation says regd
"{_}maxBatchSize and maxRecordsPerBatch, respect whatever smaller"{_} since it
seems to actually respect the conf which is "larger" (i.e less restrictive) due
to _||_ operator.
Further, when the `{_}CONNECT_GRPC_ARROW_MAX_BATCH_SIZE{_}` conf is read, the
value is not converted to bytes from Mib
([example|https://github.com/apache/spark/blob/3e5203c64c06cc8a8560dfa0fb6f52e74589b583/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala#L103]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]