Well, 3TB in 13 hrs is about 80 hours to sync 20TB, ie 3-4 days, and it could 
be a lot longer with a large number of small files (a good chunk, but not all, 
of our data is composed of hundreds of thousands of small .jpg image files 100 
Kbytes or so). Overall, there are millions of files that need to be transferred.

 

A good thing about rsync directly to the single server, is that if we do have 
to stop the rsync for any reason, then it will be very fast to restart later. 
Restarting an rsync part-way through a large transfer to a gluster can be 
incredibly slow, as it has to stat all the files that have made it onto the 
gluster, in order to work out where to restart. Just working out where to 
restart could take hours on glusterfs, whereas, rsync direct to xfs filesystem 
will tear through millions of stat operations and work out where to restart in 
a matter of minutes.

 

So for these reasons, it seems like we should be able to save an enormous 
amount of time rsyncing directly to the xfs bricks and adding the bricks to 
gluster later…

 

Basically, our setup has 2 (soon to be 3) reasonably powerful nodes setup like:

1)      Each node is a supermicro chassis with 12 x 4TB Hitachi disks, using 
LSI 9280-4i4e RAID controller, with large RAID6 array formatted with 4 XFS 
bricks of 9TB each, for a total of 36.5TB per node.

2)      10gbe connecting the nodes.

3)      Xeon E3-1245 quad-core (8 HT) CPU @ 3.4 GHz, 16GB RAM

These nodes definitely do not have the most powerful CPU ever, nor do they have 
huge quantities of RAM either, but the disk arrays should be capable of some 
good speed, and we hope they should be adequate for a gluster that is just a 
huge archive. We just want to move data onto it, and then access it when 
needed, or to backup data from it (to tape).

 

 

From: Ryan Nix [mailto:[email protected]] 
Sent: Thursday, 16 October 2014 12:12 PM
To: SINCOCK John
Cc: Franco Broi; gluster-users
Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on 
it?

 

Interesting.  Still, I think its better to let the Gluster client handle the 
syncing.  What happens if, for some strange reason, the rsync process dies in 
the middle of the night?  Gluster, on the other, will keep working to get the 
data on the other bricks without human intervention.  I recently used Gluster 
to sync 3 TBs of data to the another brick over a 1Gbps link in about 13 hours 
on decent hardware.

 

On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John <[email protected]> wrote:

 

We have 20 Terabytes to rsync onto a new server (which will have 32 TB 
capacity), 

And we then want to add that server to an existing 2-node gluster of 73TB (53 
TB used, 20 TB free), to give a 3-node gluster with 105TB capacity, 73TB used.

 

The reason I want to do it this way, if possible, is that Gluster is slow on 
writes, especially for small files, and we have a LOT of small files, so I’m 
pretty sure it will be  LOT faster to rsync directly to the new server (which 
is the one that has free space anyway), and then add that server to the gluster 
– if it is possible to have gluster recognise those files.

 

 

From: Ryan Nix [mailto:[email protected]] 
Sent: Thursday, 16 October 2014 11:58 AM
To: SINCOCK John
Cc: Franco Broi; gluster-users


Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on 
it?

 

So Gluster, at its core, uses rsync to copy the data to the other bricks.  Why 
not let Gluster do the heavy lifting?

 

On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John <[email protected]> wrote:


In a related question... it seems, if it is possible to add filesystems already 
containing data, as new bricks, then it should also be possible to:

1) create empty bricks
2) add them to the gluster volume while they are empty
3) rsync data directly onto the underlying empty bricks, circumventing gluster, 
ie not through the gluster mountpoint
4) somehow get gluster to recognise the data that has been copied into the 
bricks?

How would you go about getting gluster to see the data you've rsynced directly 
in?
My concern would be that all the data rsynced directly onto the bricks will 
just sit there, invisible to glusterfs.

Thanks again for any info!


-----Original Message-----
From: Franco Broi [mailto:[email protected]]
Sent: Thursday, 16 October 2014 10:06 AM
To: SINCOCK John
Cc: [email protected]
Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on 
it?

I've never added a brick with existing files but I did start a new Gluster 
volume on disks that already contained data and I was able to access the files 
without problem. Of course the files will be out of place but the first time 
you access them, Gluster will add links to speed up future lookups.

On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote:
> Hi Everyone,
>
>
>
> All the instructions I’ve been able to find on adding a brick to a
> gluster, seem to assume the brick is empty when it’s added.
>
>
>
> So my question is, is it possible for a new brick, loaded up with
> files, to be added to a gluster (and for all the files already on that
> brick, to be indexed and added into the gluster). Apologies if the
> question is answered elsewhere, but I couldn’t find anyone addressing
> this specific question, and certainty helps when you’re dealing with
> 10’s of terabytes of data... ;-)
>
>
>
> Thanks in advance for any info or tips!
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

 

 

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to