[
https://issues.apache.org/jira/browse/PHOENIX-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294293#comment-15294293
]
James Taylor commented on PHOENIX-2929:
---------------------------------------
The odd thing, [~poornachandra], is that the client JVM is brought down after
inserting 1M rows, let subsequent clients begin getting slower over time.
Restarting the transaction manager makes it start going faster again. Any
theories?
> Transactional writes get slow over time due to possible memory leak
> -------------------------------------------------------------------
>
> Key: PHOENIX-2929
> URL: https://issues.apache.org/jira/browse/PHOENIX-2929
> Project: Phoenix
> Issue Type: Bug
> Reporter: Mujtaba Chohan
> Attachments: txn-manager-log-during-slow-writes.log
>
>
> Write to transactional table get up to ~3X slower compared to
> non-transactional table.
> Details:
> * Transaction Manager (TM) is *always* kept up and running between all data
> loads
> * JVM for data loader is stopped after each 1M rows upsert
> * Data loader uses 10 threads to write in parallel with 10K batch size
> * Data load time is comparable for first few runs and it gets ~3X slow after
> transaction manager is up for 5-10 hours
> Data loader client details for txn-table: ~100sec GC pause with transactional
> table. Total time to load 1M rows ~4mins. ~3.2G physical mem used. Java XMX
> param was set to 3G.
> {code}
> JSTAT for data loader - 10 sec interval
> S0C S1C S0U S1U EC EU OC OU MC
> MU CCSC CCSU YGC YGCT FGC FGCT GCT
> 40448.0 40960.0 0.0 21920.0 571904.0 0.0 349696.0 87959.7
> 42024.0 41469.1 4904.0 4739.9 13 0.174 2 0.072 0.246
> 333824.0 333824.0 0.0 121135.1 380928.0 84826.0 2097152.0 2009035.3
> 42280.0 41652.1 4904.0 4740.4 29 1.172 6 5.231 6.403
> 349184.0 349184.0 0.0 0.0 350208.0 350205.8 2097152.0 1952578.1
> 42536.0 41758.6 4904.0 4742.5 36 1.353 12 9.882 11.234
> 322560.0 336896.0 214566.6 104273.9 344576.0 344561.2 2097152.0 1859305.5
> 42536.0 41774.5 4904.0 4742.5 45 1.643 17 14.684 16.327
> 349184.0 349184.0 41351.1 0.0 350208.0 48147.1 2097152.0 1935605.0
> 42536.0 41811.3 4904.0 4742.5 54 2.081 24 19.862 21.943
> 349184.0 349184.0 156685.4 0.0 350208.0 0.0 2097152.0 2073856.9
> 42536.0 41817.5 4904.0 4742.5 68 2.632 30 22.649 25.281
> 344576.0 52224.0 0.0 51789.6 350208.0 0.0 2097152.0 1871862.2
> 42536.0 41823.4 4904.0 4742.5 79 2.991 37 27.764 30.756
> 349184.0 349184.0 302724.6 0.0 350208.0 0.0 2097152.0 2023258.1
> 42536.0 41830.3 4904.0 4742.5 92 3.559 43 30.363 33.922
> 305664.0 104448.0 0.0 104434.1 358400.0 114713.9 2097152.0 1857545.7
> 42536.0 41846.0 4904.0 4742.5 101 3.804 50 36.314 40.118
> 335872.0 66048.0 0.0 65680.0 350208.0 247000.5 2097152.0 1923473.1
> 42536.0 41852.3 4904.0 4742.5 115 4.364 56 39.622 43.986
> 349184.0 349184.0 0.0 349163.6 350208.0 0.0 2097152.0 1975419.5
> 42536.0 41861.6 4904.0 4742.5 125 4.863 60 44.782 49.645
> 349184.0 349184.0 0.0 17375.8 350208.0 0.0 2097152.0 1999279.6
> 42536.0 41863.4 4904.0 4742.5 133 5.091 66 50.602 55.693
> 340992.0 345088.0 0.0 0.0 350208.0 350208.0 2097152.0 1977601.3
> 42536.0 41894.1 4904.0 4742.5 146 5.566 73 54.388 59.953
> 349184.0 349184.0 0.0 0.0 350208.0 350208.0 2097152.0 1963150.5
> 42536.0 41906.5 4904.0 4742.5 158 6.036 80 58.024 64.060
> 349184.0 349184.0 0.0 63008.0 350208.0 56631.7 2097152.0 1942708.7
> 42536.0 41916.8 4904.0 4742.5 177 6.751 84 60.799 67.550
> 345600.0 349184.0 263005.5 0.0 350208.0 55075.8 2097152.0 1883022.2
> 42536.0 41925.8 4904.0 4742.5 192 7.295 88 65.144 72.439
> 349184.0 344064.0 139656.9 0.0 355328.0 0.0 2097152.0 2070558.8
> 42536.0 41935.4 4904.0 4742.5 204 7.803 95 68.618 76.421
> 349184.0 349184.0 63008.0 0.0 350208.0 0.0 2097152.0 1963035.3
> 42536.0 41971.9 4904.0 4742.5 218 8.339 102 72.534 80.873
> 349184.0 349184.0 0.0 17984.0 350208.0 253707.1 2097152.0 1848396.0
> 42536.0 41996.1 4904.0 4742.5 229 8.686 108 76.526 85.212
> 349184.0 324608.0 0.0 284036.3 351744.0 0.0 2097152.0 1776761.8
> 42536.0 42021.6 4904.0 4742.5 247 9.320 113 79.669 88.989
> 349184.0 349184.0 38510.2 0.0 350208.0 160454.7 2097152.0 1896405.6
> 42536.0 42036.0 4904.0 4742.5 256 9.555 115 80.723 90.278
> 342528.0 333824.0 147271.1 0.0 358912.0 312219.3 2097152.0 1882525.2
> 42792.0 42065.0 4904.0 4742.5 278 10.620 116 81.143 91.763
> 349184.0 283648.0 0.0 283645.6 370176.0 249240.7 2097152.0 1893727.0
> 42792.0 42066.5 4904.0 4742.5 303 11.738 117 81.684 93.422
> 349184.0 349184.0 49417.0 0.0 350208.0 172262.6 2097152.0 1127855.3
> 42792.0 42071.0 4904.0 4742.5 326 12.787 119 82.895 95.682
> 349184.0 349184.0 0.0 131245.0 350208.0 133019.3 2097152.0 1074752.6
> 42792.0 42081.4 4904.0 4742.5 349 13.943 120 83.448 97.391
> {code}
> Data loader client details for non-txn table: ~12sec GC pause with
> non-transactional table. Total time to load 1M rows ~1mins. ~2G physical mem
> used. Java XMX param was set to 3G.
> {code}
> JSTAT for data loader - 10 sec interval
> S0C S1C S0U S1U EC EU OC OU MC
> MU CCSC CCSU YGC YGCT FGC FGCT GCT
> 24576.0 28160.0 0.0 0.0 195584.0 28008.1 349696.0 25257.3 35496.0
> 35020.3 4264.0 4037.5 6 0.056 2 0.071 0.128
> 79872.0 79872.0 0.0 79860.8 888832.0 0.0 753664.0 753092.4
> 41000.0 40361.6 4648.0 4563.6 17 0.616 3 0.071 0.688
> 259584.0 260096.0 129075.4 161535.6 528384.0 528384.0 937472.0 767859.8
> 41000.0 40458.7 4648.0 4564.6 31 1.319 4 1.622 2.941
> 283136.0 278016.0 23716.8 0.0 487424.0 143734.6 1498624.0 662740.5
> 41000.0 40479.8 4648.0 4564.6 48 2.459 7 3.122 5.582
> 240640.0 251392.0 162230.5 31674.1 545792.0 545792.0 1530368.0 642312.2
> 41000.0 40531.4 4648.0 4567.6 69 3.733 9 4.069 7.802
> 289280.0 282624.0 19730.4 0.0 468480.0 403700.7 1786880.0 672352.5
> 41256.0 40645.2 4648.0 4567.6 86 4.974 11 5.160 10.134
> 282624.0 290816.0 163586.1 0.0 466944.0 268299.4 1950208.0 1491203.8
> 41256.0 40667.0 4648.0 4567.6 108 6.288 12 5.723 12.011
> {code}
> * TM uses ~4G physical memory (there's no GC pause for TM)
> * After restarting TM, TM memory usage drops to 0.5G and data loader GC
> pauses decrease to ~15sec for transactional table and 1M gets loaded in ~1min.
> This hints to possible leak in TM. Since this test was on a remote secure
> cluster, I haven't been able to profile it so I'll try to repro it locally to
> get more info.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)