zhztheplayer commented on issue #5348:
URL:
https://github.com/apache/incubator-gluten/issues/5348#issuecomment-2050983886
I have mirrored the test into Scala and could not reproduce this issue in
newest Gluten code.
```scala
// with setting spark.memory.offHeap.size=1g
test("data gen") {
val df1 = spark.sql(
"CREATE TABLE test_spill_02(f1 string, f2 string, f3 string, f4 string)
USING csv " +
"OPTIONS (path '/root/Workloads/SF/bug1/data.txt', delimiter '\t')")
df1.collect()
val df2 = spark.sql("select * from test_spill_02 distribute by f1")
df2.write.parquet("/root/Workloads/SF/bug1/parquet")
}
test("simple_select") {
val df1 = spark.sql(
"CREATE TABLE test_spill_03(f1 string, f2 string, f3 string, f4 string)
USING parquet " +
"OPTIONS (path '/root/Workloads/SF/bug1/parquet')")
df1.collect()
val df2 = spark.sql("select f1, f2, f3, f4\n," +
"lag(f2) over(partition by f1 order by cast(f4 as bigint) asc) f5\n" +
"from test_spill_03")
df2.write.mode(SaveMode.Overwrite).parquet("/root/Workloads/SF/bug1/parquet2")
}
```
So the issue might be already solved if using newest Gluten main code, since
Velox's spill code has changed a lot after Gluten 1.1 release @weiting-chen
Also I'll try obtain a 1.1 binary then test with the case again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]