[ 
https://issues.apache.org/jira/browse/ARROW-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15983:
---------------------------------------
    Component/s: Python

> Spark job fails due to arrow buf limitation
> -------------------------------------------
>
>                 Key: ARROW-15983
>                 URL: https://issues.apache.org/jira/browse/ARROW-15983
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Shubham Chhabra
>            Priority: Major
>
>  
> Hello,
> Groupby + applyinPandas results in following error. We need some parameter to 
> tune buffer size.
>  
> {code:java}
> Caused by: java.lang.IndexOutOfBoundsException: index: 0, length: 1073741824 
> (expected: range(0, 0)) at 
> io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:716) at 
> io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:954) at 
> org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:508)
>  at 
> org.apache.arrow.vector.BaseVariableWidthVector.handleSafe(BaseVariableWidthVector.java:1239)
>  at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1066)
>  at 
> org.apache.spark.sql.execution.arrow.StringWriter.setValue(ArrowWriter.scala:287)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowFieldWriter.write(ArrowWriter.scala:151)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowWriter.write(ArrowWriter.scala:105) 
> at 
> org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:100)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581) at 
> org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:122)
>  at 
> org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:478)
>  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2146) at 
> org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:270){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to