[jira] [Comment Edited] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization

Alan Boudreault (JIRA) Tue, 18 Nov 2014 17:55:03 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216873#comment-14216873
 ]


Alan Boudreault edited comment on CASSANDRA-7386 at 11/19/14 1:54 AM:
----------------------------------------------------------------------

devs, I've tested this issue with and without the patch and analysed the disk 
usage of 3 scenarios. The patch works well and fix important issues related to 
multiple directories. I'm sharing with you the results with the graphes 
(attached below):

For all my tests, I have been able to reproduce the issues using multiple 
directories. No need to *hammer* the node with compaction and repair, I simply 
limited the concurrent_compactors and the compaction_throughput_mb_per_sec to 
slow things. This makes the disk busy during the pick selection.

h4. Test 1

* 2 Disks of the same size
* Goal: stress the server to fill all disks

h5. Result - No Patch

Only one disk is filled and the other one is never filled. Cassandra-stress 
crashed with WriteTimeoutException while the second disk remains at ~20% of 
disk usage.

!test1_no_patch.jpg|thumbnail!

h5. Result - With Patch

Success. Both disk are filled at approximatively the same speed.

h4. Test 2

* 5 disks total of the same size
* 2 disks initially filled at ~20% 
* 3 disks added later
* Goal: stress the server to fill all disks

h5. Result - No Patch

* The first 2 disks aren't used at the beginning since they are already at 20% 
of disk usage. (That's ok)
* Some new data are written
* 2 newly added disks are used for the initial data, when they reach 20% of 
disk usage... all 4 disks are filled at approximatively the same speed.
* The last disk that is running a compaction is almost never used and remains 
at 15% of disk usage when cassandra-stress crash with write timeouts.

h5. Result -  With Patch

Success. All disks have been filled at approximatively the same speed. I can 
notice that Cassandra doesn't wait untill all 3 newly added disks are at 20% to 
re-use the disk 1 and 2, but it keeps things OK and reduce the difference 
through the run.

h4.  Test 3

* 5 disks total. 
* 4 disks of 2G of size
* 1 disk of 10G of size (5x more than the other ones)
* Goal: stress the server to fill all disks

h5. Result - No Patch

* The disk #5 (10G of size) is initially use then an internal compaction is 
started.
* All the 4 other disks are completely filled and the disk 5 is never used 
anymore. Cassandra-stress crash with write timeout and the disk5 remains at 15% 
of disk usage with more than 8G of free space.

h5. Result - With Patch 

Success. All 5 disks are filled at approximatively the same speed. 

See the result images attached below..


was (Author: aboudreault):
devs, I've tested this issue with and without the patch and analysed the disk 
usage of 3 scenarios. The patch works well and fix important issues related to 
multiple directories. I'm sharing with you the results with the graphes 
(attached below):

For all my tests, I have been able to reproduce the issues using multiple 
directories. No need to *hammer* the node with compaction and repair, I simply 
limited the concurrent_compactors and the compaction_throughput_mb_per_sec to 
slow things. This makes the disk busy during the pick selection.

h4. Test 1

* 2 Disks of the same size
* Goal: stress the server to fill all disks

h5. Result - No Patch

Only one disk is filled and the other one is never filled. Cassandra-stress 
crashed with WriteTimeoutException while the second disk remains at ~20% of 
disk usage.

h5. Result - With Patch

Success. Both disk are filled at approximatively the same speed.

h4. Test 2

* 5 disks total of the same size
* 2 disks initially filled at ~20% 
* 3 disks added later
* Goal: stress the server to fill all disks

h5. Result - No Patch

* The first 2 disks aren't used at the beginning since they are already at 20% 
of disk usage. (That's ok)
* Some new data are written
* 2 newly added disks are used for the initial data, when they reach 20% of 
disk usage... all 4 disks are filled at approximatively the same speed.
* The last disk that is running a compaction is almost never used and remains 
at 15% of disk usage when cassandra-stress crash with write timeouts.

h5. Result -  With Patch

Success. All disks have been filled at approximatively the same speed. I can 
notice that Cassandra doesn't wait untill all 3 newly added disks are at 20% to 
re-use the disk 1 and 2, but it keeps things OK and reduce the difference 
through the run.

h4.  Test 3

* 5 disks total. 
* 4 disks of 2G of size
* 1 disk of 10G of size (5x more than the other ones)
* Goal: stress the server to fill all disks

h5. Result - No Patch

* The disk #5 (10G of size) is initially use then an internal compaction is 
started.
* All the 4 other disks are completely filled and the disk 5 is never used 
anymore. Cassandra-stress crash with write timeout and the disk5 remains at 15% 
of disk usage with more than 8G of free space.

h5. Result - With Patch 

Success. All 5 disks are filled at approximatively the same speed. 

See the result images attached below..

> JBOD threshold to prevent unbalanced disk utilization
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7386
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7386
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Lohfink
>            Assignee: Robert Stupp
>            Priority: Minor
>             Fix For: 2.1.3
>
>         Attachments: 7386-2.0-v3.txt, 7386-2.0-v4.txt, 7386-2.1-v3.txt, 
> 7386-2.1-v4.txt, 7386-v1.patch, 7386v2.diff, Mappe1.ods, 
> mean-writevalue-7disks.png, patch_2_1_branch_proto.diff, 
> sstable-count-second-run.png, test1_no_patch.jpg, test1_with_patch.jpg, 
> test2_no_patch.jpg, test2_with_patch.jpg, test3_no_patch.jpg, 
> test3_with_patch.jpg
>
>
> Currently the pick the disks are picked first by number of current tasks, 
> then by free space.  This helps with performance but can lead to large 
> differences in utilization in some (unlikely but possible) scenarios.  Ive 
> seen 55% to 10% and heard reports of 90% to 10% on IRC.  With both LCS and 
> STCS (although my suspicion is that STCS makes it worse since harder to be 
> balanced).
> I purpose the algorithm change a little to have some maximum range of 
> utilization where it will pick by free space over load (acknowledging it can 
> be slower).  So if a disk A is 30% full and disk B is 5% full it will never 
> pick A over B until it balances out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization

Reply via email to