[GitHub] [incubator-uniffle] xianjingfeng commented on pull request #294: [Improvement] Skip blocks when read from memory

GitBox Fri, 09 Dec 2022 00:46:46 -0800


xianjingfeng commented on PR #294:
URL: 
https://github.com/apache/incubator-uniffle/pull/294#issuecomment-1344017352


   # Performance Test
   ## Table
   
   Table1: 10g, dtypes: Array[(String, String)] = Array((v1,StringType), 
(k1,IntegerType)).
   And all columns of k1 have the same value (value = 10)
   
   Table2: 10 records, dtypes: Array[(String, String)] = 
Array((k2,IntegerType), (v2,StringType)).
   And it has the only one record of k2=10
   
   ## Env
   Spark Resource Profile: 10 executors(1core4g)
   Shuffle-server Environment: 6 shuffle servers, 20g for buffer read and 40g 
for buffer write.
   Spark Shuffle Client Config: storage type: MEMORY_LOCALFILE_HDFS with 
LOCAL_ORDER
   SQL: spark.sql("select * from Table1,Table2 where k1 = 
k2").write.mode("overwrite").parquet("xxxxxx")
   
   ## Result
   `BITMAP` and `MINMAX` look similar. I think their gap has little impact on 
the overall performance. See the following picture.
   
![sc-20221209163411](https://user-images.githubusercontent.com/11752250/206660778-e6aa3d6d-fdbb-4fc3-9a2a-471066dddde4.png)
   
   cc @jerqi @zuston
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-uniffle] xianjingfeng commented on pull request #294: [Improvement] Skip blocks when read from memory

Reply via email to