[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361972#comment-17361972
 ] 

Rajesh Mahindra commented on HUDI-818:
--------------------------------------

Benchmarks results across EMR node with both SDD and HDD below. tl;dr: Do not 
see any significant regressions/ unexpected spikes in latencies for spillable 
map, that may require immediate attn. 

 

Case 1: Benchmark results with EMR m5.xlarge
4 vCore, 16 GiB memory, EBS only storage
EBS Storage:2000 GiB with ST1 HDD storage
---------------------------------------------------------------

THROUGHPUT using dd:
-------------------

[hadoop@ip-172-31-26-21 hudi]$ dd if=/dev/zero of=/mnt/test bs=512 count=10000 
oflag=direct
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 35.8048 s, 143 kB/s


[hadoop@ip-172-31-26-21 ~]$ dd if=/dev/zero of=/mnt/test bs=1K count=10000 
oflag=direct
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 33.2558 s, 308 kB/s


[hadoop@ip-172-31-26-21 ~]$ dd if=/dev/zero of=/mnt/test bs=1M count=10000 
oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 42.2197 s, 248 MB/s


LATENCY using IOPING: 
----------------------

FOR 512 Bytes block size
[hadoop@ip-172-31-26-21 hudi]$ sudo ~/ioping-0.8/ioping -R /dev/nvme1n1p2 -s 
512 -w 120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
61.5 k requests completed in 2.0 min, 512 iops, 256.2 KiB/s
min/avg/max/mdev = 1 us / 2.0 ms / 34.6 ms / 2.2 ms

FOR 4K block size
[hadoop@ip-172-31-26-21 ~]$ sudo ./ioping-0.8/ioping -R /dev/nvme1n1p2 -s 4K -w 
120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
61.7 k requests completed in 2.0 min, 515 iops, 2.0 MiB/s
min/avg/max/mdev = 176 us / 1.9 ms / 31.9 ms / 2.1 ms

BENCHMARKING WITH LOAD OF GET AND PUT (Code written in 
org.apache.hudi.common.util.collection.TestExternalSpillableMap):
------------------------------------------------------------------------------------------------------------------------
2 RUNS with 5M records of 500B each:

GET MEM: \{0=860225, 1=485} 
GET DISK: \{128=1, 0=4033664, 65=1, 129=1, 1=105603, 99=1, 5=1, 199=1, 44=1, 
77=1, 16=1, 145=1, 117=3, 118=1, 123=1, 124=3, 221=1, 125=2, 126=1, 30=1, 31=1}

PUT MEM: \{0=859029, 1=423} 
PUT DISK: \{0=4108753, 1=31712, 130=1, 131=2, 128=4, 129=3, 3588=1, 133=2, 
136=1, 139=1, 142=1, 144=1, 145=1, 20=1, 21=1, 152=1, 153=1, 157=1, 3621=1, 
37=2, 44=1, 172=1, 49=1, 50=1, 54=1, 55=1, 60=1, 61=1, 68=1, 70=1, 71=1, 78=1, 
209=1, 82=1, 83=1, 85=1, 89=1, 93=1, 226=1, 101=1, 108=1, 109=3, 111=1, 112=1, 
113=2, 114=1, 116=2, 117=2, 118=3, 119=2, 120=1, 121=1, 122=3, 124=3, 125=3, 
126=2, 127=7}


GET MEM: \{0=860207, 1=668, 3=1, 5=1} 
GET DISK: \{0=3988026, 1=150580, 2=185, 3=104, 4=61, 5=68, 6=27, 7=19, 8=10, 
9=9, 10=7, 11=4, 12=1, 204=1, 13=2, 15=2, 146=1, 18=1, 19=1, 21=1, 150=1, 
155=1, 226=1, 165=1, 230=1, 169=1, 44=1, 239=1, 114=1, 179=1, 253=1, 190=1, 
255=1, 191=1}

PUT MEM: \{0=860348, 1=614, 9=1}

PUT DISK: \{0=4084431, 1=54357, 129=1, 130=1, 2=65, 3=31, 4=23, 261=1, 5=23, 
6=9, 7=9, 8=2, 265=1, 9=4, 10=1, 139=1, 11=1, 12=3, 140=1, 14=2, 270=1, 144=1, 
17=1, 273=1, 145=1, 146=3, 147=3, 20=1, 21=2, 150=1, 280=1, 155=2, 156=1, 
285=1, 287=1, 163=1, 169=1, 170=3, 171=2, 172=2, 173=1, 176=1, 178=1, 180=1, 
181=1, 182=1, 183=2, 314=1, 187=1, 316=1, 191=1, 192=1, 4803=1, 197=1, 202=1, 
75=1, 208=1, 209=1, 84=1, 213=1, 214=1, 223=2, 224=1, 225=1, 227=1, 228=1, 
101=1, 232=1, 237=1, 238=1, 240=1, 242=1, 243=1, 372=1, 245=1, 247=1, 248=1, 
250=1, 254=1}


Case 1: Benchmark results with EMR m5.xlarge
4 vCore, 16 GiB memory, EBS only storage
EBS Storage:2000 GiB with GP2 SDD storage
---------------------------------------------------------------

THROUGHPUT using dd:
-------------------
[hadoop@ip-172-31-30-32 hudi]$ dd if=/dev/zero of=/mnt/test bs=512 count=10000 
oflag=direct
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 8.11925 s, 631 kB/s

[hadoop@ip-172-31-30-32 ~]$ dd if=/dev/zero of=/mnt/test bs=1K count=100000 
oflag=direct
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 85.7164 s, 1.2 MB/s

[hadoop@ip-172-31-30-32 mnt]$ dd if=/dev/zero of=/mnt/test bs=1M count=10000 
oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 88.494 s, 118 MB/s


LATENCY using IOPING: 
---------------------
For 512 Bytes block size
[hadoop@ip-172-31-30-32 hudi]$ sudo ~/ioping-0.8/ioping -R /dev/nvme1n1p2 -s 
512 -w 120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
227.7 k requests completed in 2.0 min, 1.9 k iops, 950.8 KiB/s
min/avg/max/mdev = 2 us / 525 us / 19.1 ms / 506 us

For 4K block size
[hadoop@ip-172-31-30-32 ~]$ sudo ./ioping-0.8/ioping -R /dev/nvme1n1p2 -s 4K -w 
120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
223.4 k requests completed in 2.0 min, 2.0 k iops, 7.6 MiB/s
min/avg/max/mdev = 127 us / 511 us / 35.0 ms / 499 us

BENCHMARKING WITH LOAD OF GET AND PUT (Code written in 
org.apache.hudi.common.util.collection.TestExternalSpillableMap):
------------------------------------------------------------------------------------------------------------------------
2 RUNS with 5M records of 500B each:

GET MEM: \{0=858774, 1=741, 2=2} 
GET DISK: \{0=2104664, 1=318447, 2=971559, 3=641488, 4=82966, 5=10705, 6=3125, 
7=1721, 8=1696, 9=1480, 10=1091, 11=403, 12=304, 141=1, 13=347, 14=216, 15=77, 
16=37, 17=20, 18=18, 19=8, 20=14, 21=6, 22=9, 23=10, 24=5, 25=6, 26=4, 27=2, 
155=1, 28=5, 29=1, 30=3, 31=1, 159=1, 32=1, 33=1, 34=1, 36=2, 37=2, 38=2, 39=2, 
40=2, 41=2, 42=2, 43=1, 44=1, 45=1, 46=1, 47=1, 53=1, 54=2, 62=1, 67=1, 73=1, 
77=1, 86=1, 97=1, 99=1, 100=1, 101=1, 102=2, 103=2, 104=1, 105=1, 106=1, 107=1, 
112=1}

PUT MEM: \{0=859173, 1=778, 2=1} 
PUT DISK: \{128=1, 0=4078310, 1=61508, 2=75, 3=24, 4=22, 5=9, 6=12, 7=8, 8=7, 
9=3, 10=3, 138=1, 11=2, 12=1, 141=1, 13=3, 142=1, 14=1, 146=1, 149=2, 153=1, 
28=1, 158=1, 159=1, 38=1, 169=1, 177=1, 58=1, 314=1, 187=1, 193=1, 66=1, 328=1, 
205=1, 78=1, 207=1, 208=1, 213=1, 87=2, 89=1, 90=3, 91=2, 92=1, 93=1, 94=1, 
95=2, 97=1, 100=1, 228=1, 101=1, 102=1, 103=2, 104=2, 105=3, 106=2, 108=1, 
237=1, 4845=1, 111=1, 112=1, 4081=1, 114=1, 115=1, 119=1}


GET MEM: \{0=860506, 1=670} 
GET DISK: \{0=3994892, 1=143509, 2=134, 3=74, 4=65, 261=1, 5=30, 6=25, 64=1, 
135=1, 7=11, 8=21, 9=11, 10=4, 11=8, 12=10, 13=2, 14=1, 206=1, 207=1, 15=4, 
208=1, 17=1, 146=1, 19=1, 20=1, 21=1, 22=1, 152=2, 25=1, 155=2, 156=1, 33=1, 
230=1, 41=1, 169=1, 182=1, 125=1}

PUT MEM: \{0=859540, 1=638, 2=1, 3=1, 8=1} 
PUT DISK: \{0=4088463, 1=51135, 2=60, 131=2, 3=30, 132=1, 4=15, 261=1, 5=14, 
134=1, 262=1, 6=7, 7=7, 136=1, 8=13, 9=4, 10=2, 11=1, 140=1, 12=2, 142=1, 
143=2, 271=1, 145=2, 146=3, 149=2, 151=1, 23=1, 155=1, 158=2, 159=1, 160=1, 
4771=1, 166=1, 295=1, 167=2, 4649=1, 299=1, 172=2, 174=1, 48=1, 49=1, 181=1, 
182=1, 187=1, 189=1, 61=1, 190=1, 68=1, 208=1, 80=3, 81=1, 212=1, 87=1, 219=1, 
92=1, 221=1, 94=1, 225=1, 227=1, 233=2, 234=1, 238=1, 117=1, 121=1, 250=1, 
123=1, 125=1}

> Optimize the default value of hoodie.memory.merge.max.size option
> -----------------------------------------------------------------
>
>                 Key: HUDI-818
>                 URL: https://issues.apache.org/jira/browse/HUDI-818
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 0.9.0
>            Reporter: lamber-ken
>            Assignee: Rajesh Mahindra
>            Priority: Blocker
>              Labels: help-requested, sev:high, user-support-issues
>             Fix For: 0.9.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to