xianjingfeng commented on PR #294:
URL:
https://github.com/apache/incubator-uniffle/pull/294#issuecomment-1344017352
# Performance Test
## Table
Table1: 10g, dtypes: Array[(String, String)] = Array((v1,StringType),
(k1,IntegerType)).
And all columns of k1 have the same value (value = 10)
Table2: 10 records, dtypes: Array[(String, String)] =
Array((k2,IntegerType), (v2,StringType)).
And it has the only one record of k2=10
## Env
Spark Resource Profile: 10 executors(1core4g)
Shuffle-server Environment: 6 shuffle servers, 20g for buffer read and 40g
for buffer write.
Spark Shuffle Client Config: storage type: MEMORY_LOCALFILE_HDFS with
LOCAL_ORDER
SQL: spark.sql("select * from Table1,Table2 where k1 =
k2").write.mode("overwrite").parquet("xxxxxx")
## Result
`BITMAP` and `MINMAX` look similar. I think their gap has little impact on
the overall performance. See the following picture.

cc @jerqi @zuston
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]