[jira] [Created] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect

Venkata Sai Akhil Gudesa (Jira) Thu, 03 Aug 2023 05:47:33 -0700

Venkata Sai Akhil Gudesa created SPARK-44657:
------------------------------------------------


             Summary: Incorrect limit handling and config parsing in Arrow 
collect
                 Key: SPARK-44657
                 URL: https://issues.apache.org/jira/browse/SPARK-44657
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 3.4.1, 3.4.0, 3.4.2, 3.5.0
            Reporter: Venkata Sai Akhil Gudesa


In the arrow writer 
[code|https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L154-L163]
 , the conditions don’t seem to hold what the documentation says regd 
"{_}maxBatchSize and maxRecordsPerBatch, respect whatever smaller"{_} since it 
seems to actually respect the conf which is "larger" (i.e less restrictive) due 
to _||_ operator.

 

Further, when the `{_}CONNECT_GRPC_ARROW_MAX_BATCH_SIZE{_}` conf is read, the 
value is not converted to bytes from Mib 
([example|https://github.com/apache/spark/blob/3e5203c64c06cc8a8560dfa0fb6f52e74589b583/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala#L103]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect

Reply via email to