Hi Krutika,

 

Thank you so much for myour reply. Let me answer all:

 

1.      I have no idea why it did not get distributed over all bricks.
2.      Hm.. This is really weird.

 

And others;

 

No. I use only one volume. When I tested sharded and striped volumes, I 
manually stopped volume, deleted volume, purged data (data inside of 
bricks/disks) and re-create by using this command:

 

sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 
sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 
sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 
sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 
sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 
sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 
sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 
sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 
sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 
sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 
sr-10-loc-50-14-18:/bricks/brick10 force

 

and of course after that volume start executed. If shard enabled, I enable that 
feature BEFORE I start the sharded volume than mount.

 

I tried converting from one to another but then I saw documentation says clean 
voluje should be better. So I tried clean method. Still same performance.

 

Testfile grows from 1GB to 5GB. And tests are dd. See this example:

 

dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s

 

 

>> dd if=/dev/zero of=/mnt/testfile bs=5G count=1

This also gives same result. (bs and count reversed)

 

 

And this example have generated a profile which I also attached to this e-mail.

 

Is there anything that I can try? I am open to all kind of suggestions.

 

Thanks,

Gencer.

 

From: Krutika Dhananjay [mailto:[email protected]] 
Sent: Tuesday, July 4, 2017 9:39 AM
To: [email protected]
Cc: gluster-user <[email protected]>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to 
have witnessed all the IO. No other bricks witnessed any write operations. This 
is unacceptable for a volume that has 8 other replica sets. Why didn't the 
shards get distributed across all of these sets?

 

2. For replica set consisting of bricks 5 and 6 of node 09, I see that the 
brick 5 is spending 99% of its time in FINODELK fop, when the fop that should 
have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these 
numbers - one distributed-replicated-striped and another 
distributed-replicated-sharded? Or are you merely converting a single volume 
from one type to another?

 

* And if there are indeed two volumes, could you share both their `volume info` 
outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the 
mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?

 

-Krutika

 

 

On Mon, Jul 3, 2017 at 8:42 PM, <[email protected] 
<mailto:[email protected]> > wrote:

Hi Krutika,

 

Have you be able to look out my profiles? Do you have any clue, idea or 
suggestion?

 

Thanks,

-Gencer

 

From: Krutika Dhananjay [mailto:[email protected] 
<mailto:[email protected]> ] 
Sent: Friday, June 30, 2017 3:50 PM


To: [email protected] <mailto:[email protected]> 
Cc: gluster-user <[email protected] <mailto:[email protected]> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Just noticed that the way you have configured your brick order during 
volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? 
Could you try that?

If it doesn't help, could you share the volume-profile output for both the 
tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file 
download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and 
redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary 
location in step 3.

-Krutika

 

On Fri, Jun 30, 2017 at 5:33 PM, <[email protected] 
<mailto:[email protected]> > wrote:

Hi Krutika,

 

Sure, here is volume info:

 

root@sr-09-loc-50-14-18:/# gluster volume info testvol

 

Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on

 

-Gencer.

 

From: Krutika Dhananjay [mailto:[email protected] 
<mailto:[email protected]> ] 
Sent: Friday, June 30, 2017 2:50 PM
To: [email protected] <mailto:[email protected]> 
Cc: gluster-user <[email protected] <mailto:[email protected]> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Could you please provide the volume-info output?

-Krutika

 

On Fri, Jun 30, 2017 at 4:23 PM, <[email protected] 
<mailto:[email protected]> > wrote:

Hi,

 

I have an 2 nodes with 20 bricks in total (10+10).

 

First test: 

 

2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes

 

“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s

 

Now same test without Stripe but with sharding. This results are same when I 
set shard size 4MB or 32MB. (Again 2x Replica here)

 

Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s

 

Now, If we do this test twice at the same time (two dd or two doewnload at the 
same time) it goes below 25/mb each or slower.

 

I thought sharding is at least equal or a little slower (maybe?) but these 
results are terribly slow.

 

I tried tuning (cache, window-size etc..). Nothing helps.

 

GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB 
each.

 

Is there any tweak/tuning out there to make it fast?

 

Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot 
use this on production as it is terribly slow. 

 

The reason behind I use shard instead of stripe is i would like to eleminate 
files that bigger than brick size.

 

Thanks,

Gencer.


_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]> 
http://lists.gluster.org/mailman/listinfo/gluster-users

 

 

 

Attachment: dd-5gb-shard_32mb.log
Description: Binary data

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to