[jira] [Assigned] (SPARK-43789) Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

Hyukjin Kwon (Jira) Thu, 25 May 2023 04:04:10 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-43789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon reassigned SPARK-43789:
------------------------------------

    Assignee: Hyukjin Kwon

> Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with 
> Arrow by default
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-43789
>                 URL: https://issues.apache.org/jira/browse/SPARK-43789
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR
>    Affects Versions: 3.5.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>
> Now, createDataFrame uses `1` for numPartitions by default, which isn't 
> realistic. Should use larger number for default partitions.
> In PySpark, we chunk the input data by  
> 'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow 
> that in SparkR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-43789) Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

Reply via email to