Probably didn't make myself very clear.
There is nothing you need to do except have the files sitting on a brick for Gluster to make the files visible to clients. You do not have to worry about xattr's, Gluster makes them when you first access a file. On Thu, 2014-10-16 at 11:56 +0800, Franco Broi wrote: > On Thu, 2014-10-16 at 13:33 +1030, SINCOCK John wrote: > > Ah, apologies, sorry yes gluster can be good & fast on large file writes, > > it is the large number of small files that slows gluster down, and I know > > this is pretty much unavoidable. > > > I'm still not clear though, exactly how to ensure these files can be seen > > by gluster though. > > > > Ie, Franco, I'm not sure exactly what you mean when you say the files will > > be "out of place", and that a rebalance will take time. Directly filling > > the bricks up on our new server should actually bring the used/available > > space ratio on this server close to what it is on our other 2 nodes, so, > > when these files are found, somehow, by gluster, or during a rebalance, I > > don’t think gluster would need to shift much data between the nodes just to > > even out the free space. > > > > This is the way I understand it, Gluster devs feel free to jump in at > any time... > > Gluster uses a hash it calculates from the file name and the number of > bricks to distribute files evenly across the bricks. If you add a brick > and create new directories, any new file might get written to the new > brick, just depends on the hash. > > The brick layout is stored in an xattr attached to directories, for > existing directories this will only get updated if you run a fix-layout, > otherwise old directories retain the old brick layout - ie before you > added the new brick. Files added to those old directories will not go > the the new brick unless you run fix-layout. > > If you run a full rebalance after adding a brick, Gluster will move any > files that are not currently on the brick pointed to by their new hash, > it doesn't balance based on capacity but if a brick is full, Gluster > will put the file on another brick and will create a link (I guess this > assumes there's space on the full brick to make a link??). > > So by "out of place" I mean the file might not be where the hash for the > file says it is but that doesn't mean that Gluster wont find it, it will > just take a bit longer. > > > > As I understand it, gluster at the very least requires xattrs to be set on > > every file, and, obviously these will be set if data is copied in via > > gluster, and gluster places files on the bricks. But, I'm not clear > > how/if/when files will become part of the gluster, if they are not > > explicitly copied onto a brick that is already part of a gluster volume, > > via a proper gluster-aware mount: > > > > I guess I’m hoping for one of two things: > > 1) that if you add a brick with data already on it, to a gluster - that > > gluster will go through and set the xattrs on all the files, and make them > > available, as part of the process of adding the brick. Or, > > 2) that there is some way to trigger gluster to re-scan a brick, to make > > itself aware of files that have been copied in “behind” the gluster. > > > > No Gluster doesn't scan unless you try to access a file, it's only then > that it goes through the process of first using the hash, then searching > directories. Once it's found the file it will add a link so that next > time a lookup will be faster. > > > I do apologise if I'm missing the point or making this seem a lot harder > > than it is - it's just, when dealing with large amounts of data, we have to > > be certain - I can't afford to waste 2 days copying data onto the server, > > and then find I can't add the files to the gluster without deleting it all > > and then spending 5 more days transferring all the files again via gluster. > > > > No need to apologise, getting your head around this stuff is difficult, > even now (after nearly 2 years) I'm still not sure I'm giving you > accurate information. > > > Thanks again, I really do appreciate any advice that can really nail this > > down and clarify the situation. > > > > No problem. > > > -----Original Message----- > > From: Franco Broi [mailto:[email protected]] > > Sent: Thursday, 16 October 2014 12:21 PM > > To: SINCOCK John; gluster-users > > Subject: Re: [Gluster-users] Is it ok to add a new brick with files already > > on it? > > > > > > Gluster may be slow when creating lots of small files but it is not slow > > writing. > > > > I don't see a problem with what you want to do as long as you realise that > > many of the files will be out of place and a future rebalance would take a > > very long time - if you decide to run one. > > > > On Wed, 2014-10-15 at 21:12 -0500, Ryan Nix wrote: > > > Interesting. Still, I think its better to let the Gluster client > > > handle the syncing. What happens if, for some strange reason, the > > > rsync process dies in the middle of the night? Gluster, on the other, > > > will keep working to get the data on the other bricks without human > > > intervention. I recently used Gluster to sync 3 TBs of data to the > > > another brick over a 1Gbps link in about 13 hours on decent hardware. > > > > > > On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John <[email protected]> > > > wrote: > > > > > > > > > We have 20 Terabytes to rsync onto a new server (which will > > > have 32 TB capacity), > > > > > > And we then want to add that server to an existing 2-node > > > gluster of 73TB (53 TB used, 20 TB free), to give a 3-node > > > gluster with 105TB capacity, 73TB used. > > > > > > > > > > > > The reason I want to do it this way, if possible, is that > > > Gluster is slow on writes, especially for small files, and we > > > have a LOT of small files, so I’m pretty sure it will be LOT > > > faster to rsync directly to the new server (which is the one > > > that has free space anyway), and then add that server to the > > > gluster – if it is possible to have gluster recognise those > > > files. > > > > > > > > > > > > > > > > > > From: Ryan Nix [mailto:[email protected]] > > > Sent: Thursday, 16 October 2014 11:58 AM > > > To: SINCOCK John > > > Cc: Franco Broi; gluster-users > > > > > > > > > Subject: Re: [Gluster-users] Is it ok to add a new brick with > > > files already on it? > > > > > > > > > So Gluster, at its core, uses rsync to copy the data to the > > > other bricks. Why not let Gluster do the heavy lifting? > > > > > > > > > > > > > > > On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John > > > <[email protected]> wrote: > > > > > > > > > In a related question... it seems, if it is possible to add > > > filesystems already containing data, as new bricks, then it > > > should also be possible to: > > > > > > 1) create empty bricks > > > 2) add them to the gluster volume while they are empty > > > 3) rsync data directly onto the underlying empty bricks, > > > circumventing gluster, ie not through the gluster mountpoint > > > 4) somehow get gluster to recognise the data that has been > > > copied into the bricks? > > > > > > How would you go about getting gluster to see the data you've > > > rsynced directly in? > > > My concern would be that all the data rsynced directly onto > > > the bricks will just sit there, invisible to glusterfs. > > > > > > Thanks again for any info! > > > > > > > > > -----Original Message----- > > > From: Franco Broi [mailto:[email protected]] > > > Sent: Thursday, 16 October 2014 10:06 AM > > > To: SINCOCK John > > > Cc: [email protected] > > > Subject: Re: [Gluster-users] Is it ok to add a new brick with > > > files already on it? > > > > > > > > > > > > I've never added a brick with existing files but I did start a > > > new Gluster volume on disks that already contained data and I > > > was able to access the files without problem. Of course the > > > files will be out of place but the first time you access them, > > > Gluster will add links to speed up future lookups. > > > > > > On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote: > > > > Hi Everyone, > > > > > > > > > > > > > > > > All the instructions I’ve been able to find on adding a > > > brick to a > > > > gluster, seem to assume the brick is empty when it’s added. > > > > > > > > > > > > > > > > So my question is, is it possible for a new brick, loaded up > > > with > > > > files, to be added to a gluster (and for all the files > > > already on that > > > > brick, to be indexed and added into the gluster). Apologies > > > if the > > > > question is answered elsewhere, but I couldn’t find anyone > > > addressing > > > > this specific question, and certainty helps when you’re > > > dealing with > > > > 10’s of terabytes of data... ;-) > > > > > > > > > > > > > > > > Thanks in advance for any info or tips! > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > [email protected] > > > > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > [email protected] > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
