Re: [PR] [GLUTEN-6995][Core] Limit soft affinity duplicate reading detection max cache items [incubator-gluten]

via GitHub Mon, 26 Aug 2024 00:54:47 -0700


zhli1142015 commented on PR #7003:
URL: 
https://github.com/apache/incubator-gluten/pull/7003#issuecomment-2309578902


   > > Do you mean is the `SoftAffinityListener` cause of the slowness, I don't 
observe this, can you help tp share how this is repro in your env?
   > 
   > This is how I reproduced the problem locally.
   > 
   > 1. Start `spark-sql` with `--conf spark.hadoop.parquet.page.size=1024 
--conf spark.hadoop.parquet.block.size=2048 --conf 
spark.sql.files.maxPartitionBytes=2048`
   > 2. Then, run below sqls.
   > 
   > ```
   > create table test(a string) using parquet;
   > create table test1(a string) using parquet;
   > insert into test values(0); // make sure there is 10000 values in it
   > insert into test1 select /*+ REPARTITION(10000) */ * from test;
   > ```
   > 
   > 3. finally, restart `spark-sql` with `SoftAffinity` and run bellow sql
   > 
   > ```
   > select count(*) from test1;
   > ```
   
   From above I actually can't repro the issue. I think maybe this is because 
of the hardware differences, but with more partitions, we actually observe the 
latency increasing. From the code, there are only the cache get / put 
operations in the event handling logic. I thought the memory pressure (GC) is 
more like the cause.
   10K values
   
![image](https://github.com/user-attachments/assets/4faa7cf4-eb1c-4609-954d-a3a3041098f5)
   120k values
   
![image](https://github.com/user-attachments/assets/e0e85676-8c93-41b7-9178-90c1c20f4be3)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-6995][Core] Limit soft affinity duplicate reading detection max cache items [incubator-gluten]

Reply via email to