Probably didn't make myself very clear.

There is nothing you need to do except have the files sitting on a brick
for Gluster to make the files visible to clients. You do not have to
worry about xattr's, Gluster makes them when you first access a file.


On Thu, 2014-10-16 at 11:56 +0800, Franco Broi wrote: 
> On Thu, 2014-10-16 at 13:33 +1030, SINCOCK John wrote: 
> > Ah, apologies, sorry yes gluster can be good & fast on large file writes, 
> > it is the large number of small files that slows gluster down, and I know 
> > this is pretty much unavoidable. 
> 
> > I'm still not clear though, exactly how to ensure these files can be seen 
> > by gluster though.
> > 
> > Ie, Franco, I'm not sure exactly what you mean when you say the files will 
> > be "out of place", and that a rebalance will take time. Directly filling 
> > the bricks up on our new server should actually bring the used/available 
> > space ratio on this server close to what it is on our other 2 nodes, so, 
> > when these files are found, somehow, by gluster, or during a rebalance, I 
> > don’t think gluster would need to shift much data between the nodes just to 
> > even out the free space.
> > 
> 
> This is the way I understand it, Gluster devs feel free to jump in at
> any time...
> 
> Gluster uses a hash it calculates from the file name and the number of
> bricks to distribute files evenly across the bricks. If you add a brick
> and create new directories, any new file might get written to the new
> brick, just depends on the hash.
> 
> The brick layout is stored in an xattr attached to directories, for
> existing directories this will only get updated if you run a fix-layout,
> otherwise old directories retain the old brick layout - ie before you
> added the new brick. Files added to those old directories will not go
> the the new brick unless you run fix-layout.
> 
> If you run a full rebalance after adding a brick, Gluster will move any
> files that are not currently on the brick pointed to by their new hash,
> it doesn't balance based on capacity but if a brick is full, Gluster
> will put the file on another brick and will create a link (I guess this
> assumes there's space on the full brick to make a link??).
> 
> So by "out of place" I mean the file might not be where the hash for the
> file says it is but that doesn't mean that Gluster wont find it, it will
> just take a bit longer.
> 
> 
> > As I understand it, gluster at the very least requires xattrs to be set on 
> > every file, and, obviously these will be set if data is copied in via 
> > gluster, and gluster places files on the bricks. But, I'm not clear 
> > how/if/when files will become part of the gluster, if they are not 
> > explicitly copied onto a brick that is already part of a gluster volume, 
> > via a proper gluster-aware mount:
> > 
> > I guess I’m hoping for one of two things:
> > 1)  that if you add a brick with data already on it, to a gluster - that 
> > gluster will go through and set the xattrs on all the files, and make them 
> > available, as part of the process of adding the brick. Or, 
> > 2)  that there is some way to trigger gluster to re-scan a brick, to make 
> > itself aware of files that have been copied in “behind” the gluster.
> > 
> 
> No Gluster doesn't scan unless you try to access a file, it's only then
> that it goes through the process of first using the hash, then searching
> directories. Once it's found the file it will add a link so that next
> time a lookup will be faster.
> 
> > I do apologise if I'm missing the point or making this seem a lot harder 
> > than it is - it's just, when dealing with large amounts of data, we have to 
> > be certain - I can't afford to waste 2 days copying data onto the server, 
> > and then find I can't add the files to the gluster without deleting it all 
> > and then spending 5 more days transferring all the files again via gluster.
> > 
> 
> No need to apologise, getting your head around this stuff is difficult,
> even now (after nearly 2 years) I'm still not sure I'm giving you
> accurate information.
> 
> > Thanks again, I really do appreciate any advice that can really nail this 
> > down and clarify the situation.
> > 
> 
> No problem.
> 
> > -----Original Message-----
> > From: Franco Broi [mailto:[email protected]] 
> > Sent: Thursday, 16 October 2014 12:21 PM
> > To: SINCOCK John; gluster-users
> > Subject: Re: [Gluster-users] Is it ok to add a new brick with files already 
> > on it?
> > 
> > 
> > Gluster may be slow when creating lots of small files but it is not slow 
> > writing.
> > 
> > I don't see a problem with what you want to do as long as you realise that 
> > many of the files will be out of place and a future rebalance would take a 
> > very long time - if you decide to run one.
> > 
> > On Wed, 2014-10-15 at 21:12 -0500, Ryan Nix wrote: 
> > > Interesting.  Still, I think its better to let the Gluster client 
> > > handle the syncing.  What happens if, for some strange reason, the 
> > > rsync process dies in the middle of the night?  Gluster, on the other, 
> > > will keep working to get the data on the other bricks without human 
> > > intervention.  I recently used Gluster to sync 3 TBs of data to the 
> > > another brick over a 1Gbps link in about 13 hours on decent hardware.
> > > 
> > > On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John <[email protected]>
> > > wrote:
> > >          
> > >         
> > >         We have 20 Terabytes to rsync onto a new server (which will
> > >         have 32 TB capacity),
> > >         
> > >         And we then want to add that server to an existing 2-node
> > >         gluster of 73TB (53 TB used, 20 TB free), to give a 3-node
> > >         gluster with 105TB capacity, 73TB used.
> > >         
> > >          
> > >         
> > >         The reason I want to do it this way, if possible, is that
> > >         Gluster is slow on writes, especially for small files, and we
> > >         have a LOT of small files, so I’m pretty sure it will be  LOT
> > >         faster to rsync directly to the new server (which is the one
> > >         that has free space anyway), and then add that server to the
> > >         gluster – if it is possible to have gluster recognise those
> > >         files.
> > >         
> > >          
> > >         
> > >          
> > >         
> > >         From: Ryan Nix [mailto:[email protected]] 
> > >         Sent: Thursday, 16 October 2014 11:58 AM
> > >         To: SINCOCK John
> > >         Cc: Franco Broi; gluster-users
> > >         
> > >         
> > >         Subject: Re: [Gluster-users] Is it ok to add a new brick with
> > >         files already on it? 
> > >          
> > >         
> > >         So Gluster, at its core, uses rsync to copy the data to the
> > >         other bricks.  Why not let Gluster do the heavy lifting?
> > >         
> > >         
> > >          
> > >         
> > >         On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John
> > >         <[email protected]> wrote:
> > >         
> > >         
> > >         In a related question... it seems, if it is possible to add
> > >         filesystems already containing data, as new bricks, then it
> > >         should also be possible to:
> > >         
> > >         1) create empty bricks
> > >         2) add them to the gluster volume while they are empty
> > >         3) rsync data directly onto the underlying empty bricks,
> > >         circumventing gluster, ie not through the gluster mountpoint
> > >         4) somehow get gluster to recognise the data that has been
> > >         copied into the bricks?
> > >         
> > >         How would you go about getting gluster to see the data you've
> > >         rsynced directly in?
> > >         My concern would be that all the data rsynced directly onto
> > >         the bricks will just sit there, invisible to glusterfs.
> > >         
> > >         Thanks again for any info!
> > >         
> > >         
> > >         -----Original Message-----
> > >         From: Franco Broi [mailto:[email protected]]
> > >         Sent: Thursday, 16 October 2014 10:06 AM
> > >         To: SINCOCK John
> > >         Cc: [email protected]
> > >         Subject: Re: [Gluster-users] Is it ok to add a new brick with
> > >         files already on it?
> > >         
> > >         
> > >         
> > >         I've never added a brick with existing files but I did start a
> > >         new Gluster volume on disks that already contained data and I
> > >         was able to access the files without problem. Of course the
> > >         files will be out of place but the first time you access them,
> > >         Gluster will add links to speed up future lookups.
> > >         
> > >         On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote:
> > >         > Hi Everyone,
> > >         >
> > >         >
> > >         >
> > >         > All the instructions I’ve been able to find on adding a
> > >         brick to a
> > >         > gluster, seem to assume the brick is empty when it’s added.
> > >         >
> > >         >
> > >         >
> > >         > So my question is, is it possible for a new brick, loaded up
> > >         with
> > >         > files, to be added to a gluster (and for all the files
> > >         already on that
> > >         > brick, to be indexed and added into the gluster). Apologies
> > >         if the
> > >         > question is answered elsewhere, but I couldn’t find anyone
> > >         addressing
> > >         > this specific question, and certainty helps when you’re
> > >         dealing with
> > >         > 10’s of terabytes of data... ;-)
> > >         >
> > >         >
> > >         >
> > >         > Thanks in advance for any info or tips!
> > >         >
> > >         >
> > >         >
> > >         >
> > >         > _______________________________________________
> > >         > Gluster-users mailing list
> > >         > [email protected]
> > >         >
> > >         http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >         
> > >         
> > >         _______________________________________________
> > >         Gluster-users mailing list
> > >         [email protected]
> > >         http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >         
> > >         
> > >          
> > >         
> > >         
> > > 
> > > 
> > 
> > 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to