XorSum opened a new issue, #1693:
URL: https://github.com/apache/auron/issues/1693
**Describe the bug**
<!--
A clear and concise description of what the bug is.
-->
Large batches after join operations exceed Arrow Vector 2GB memory limit,
causing `OversizedAllocationException`.
**To Reproduce**
<!--
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
-->
```shell
./bin/spark-sql \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.sql.adaptive.autoBroadcastJoinThreshold=-1
```
```sql
create table tmp_t1(a int, b int) stored as orc;
with g1 as (select id as a from range(1)),
g2 as (select id as b from range(10000))
insert overwrite tmp_t1 select g1.a, g2.b from from g1 join g2;
create table tmp_t2(a int, b int) stored as orc;
with g1 as (select id as a from range(1)),
g2 as (select id as b from range(10000))
insert overwrite tmp_t2 select g1.a, g2.b from from g1 join g2;
select s, count(1) as cnt
from (select concat(
cast(date_add('2010-01-01', t1.b) as string),
cast(date_add('2010-01-02', t2.b) as string)
) as s
from tmp_t1 t1 join tmp_t2 t2 on t1.a = t2.a)
group by s
order by cnt
limit 100;
```
```java
Exception in thread "auron native task 0.0 in stage 4.0 (TID 10)"
auron.org.apache.arrow.vector.util.OversizedAllocationException: Memory
required for vector is (2147483648), which is overflow or more than max allowed
(2147483647). You could consider using LargeVarCharVector/LargeVarBinaryVector
for large strings/large bytes types
at
auron.org.apache.arrow.vector.BaseVariableWidthVector.checkDataBufferSize(BaseVariableWidthVector.java:465)
at
auron.org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:574)
at
auron.org.apache.arrow.vector.BaseVariableWidthVector.handleSafe(BaseVariableWidthVector.java:1344)
at
auron.org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1178)
at
org.apache.spark.sql.execution.auron.arrowio.util.StringWriter.setValue(ArrowWriter.scala:247)
at
org.apache.spark.sql.execution.auron.arrowio.util.ArrowFieldWriter.write(ArrowWriter.scala:126)
at
org.apache.spark.sql.execution.auron.arrowio.util.ArrowWriter.write(ArrowWriter.scala:97)
at
org.apache.auron.spark.sql.SparkAuronUDFWrapperContext.$anonfun$eval$5(SparkAuronUDFWrapperContext.scala:78)
at
org.apache.auron.spark.sql.SparkAuronUDFWrapperContext.$anonfun$eval$5$adapted(SparkAuronUDFWrapperContext.scala:76)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at
org.apache.auron.spark.sql.SparkAuronUDFWrapperContext.$anonfun$eval$4(SparkAuronUDFWrapperContext.scala:76)
at
org.apache.auron.spark.sql.SparkAuronUDFWrapperContext.$anonfun$eval$4$adapted(SparkAuronUDFWrapperContext.scala:69)
at
org.apache.spark.sql.auron.util.Using$.$anonfun$resources$9(Using.scala:395)
at org.apache.spark.sql.auron.util.Using$.resource(Using.scala:273)
at
org.apache.spark.sql.auron.util.Using$.$anonfun$resources$8(Using.scala:394)
at org.apache.spark.sql.auron.util.Using$.resource(Using.scala:273)
at
org.apache.spark.sql.auron.util.Using$.$anonfun$resources$7(Using.scala:393)
at org.apache.spark.sql.auron.util.Using$.resource(Using.scala:273)
at
org.apache.spark.sql.auron.util.Using$.$anonfun$resources$6(Using.scala:392)
at org.apache.spark.sql.auron.util.Using$.resource(Using.scala:273)
at org.apache.spark.sql.auron.util.Using$.resources(Using.scala:391)
at
org.apache.auron.spark.sql.SparkAuronUDFWrapperContext.eval(SparkAuronUDFWrapperContext.scala:69)
```
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
Query should execute without memory allocation errors.
**Screenshots**
<!--
If applicable, add screenshots to help explain your problem.
-->
**Additional context**
<!--
Add any other context about the problem here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]