zhli1142015 commented on PR #7003: URL: https://github.com/apache/incubator-gluten/pull/7003#issuecomment-2309578902
> > Do you mean is the `SoftAffinityListener` cause of the slowness, I don't observe this, can you help tp share how this is repro in your env? > > This is how I reproduced the problem locally. > > 1. Start `spark-sql` with `--conf spark.hadoop.parquet.page.size=1024 --conf spark.hadoop.parquet.block.size=2048 --conf spark.sql.files.maxPartitionBytes=2048` > 2. Then, run below sqls. > > ``` > create table test(a string) using parquet; > create table test1(a string) using parquet; > insert into test values(0); // make sure there is 10000 values in it > insert into test1 select /*+ REPARTITION(10000) */ * from test; > ``` > > 3. finally, restart `spark-sql` with `SoftAffinity` and run bellow sql > > ``` > select count(*) from test1; > ``` From above I actually can't repro the issue. I think maybe this is because of the hardware differences, but with more partitions, we actually observe the latency increasing. From the code, there are only the cache get / put operations in the event handling logic. I thought the memory pressure (GC) is more like the cause. 10K values  120k values  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
