[GitHub] [incubator-uniffle] jerqi commented on pull request #294: [Improvement] Skip blocks when read from memory

GitBox Fri, 09 Dec 2022 03:01:24 -0800


jerqi commented on PR #294:
URL: 
https://github.com/apache/incubator-uniffle/pull/294#issuecomment-1344157680


   > # Performance Test
   > ## Table
   > Table1: 10g, dtypes: Array[(String, String)] = Array((v1,StringType), 
(k1,StringType)). And all columns of k1 have the same value (value = 10)
   > 
   > Table2: 10 records, dtypes: Array[(String, String)] = 
Array((k2,StringType), (v2,StringType)). And it has the only one record of k2=10
   > 
   > ## Env
   > Spark Resource Profile: 10 executors(1core4g) Shuffle-server Environment: 
6 shuffle servers, 20g for buffer read and 40g for buffer write. Spark Shuffle 
Client Config: storage type: MEMORY_LOCALFILE_HDFS with LOCAL_ORDER SQL: 
spark.sql("select * from Table1,Table2 where k1 = 
k2").write.mode("overwrite").parquet("xxxxxx")
   > 
   > ## Result
   > `BITMAP` and `MINMAX` look similar. I think their gap has little impact on 
the overall performance. See the following picture. 
![sc-20221209163411](https://user-images.githubusercontent.com/11752250/206660778-e6aa3d6d-fdbb-4fc3-9a2a-471066dddde4.png)
   > 
   > cc @jerqi @zuston
   
   OK. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-uniffle] jerqi commented on pull request #294: [Improvement] Skip blocks when read from memory

Reply via email to