Hi,

many thanks for the explanation. Then, there seems to be a problem with
the layout updating during rebalancing. For me all directories *do have* the same mappings for the hash intervals, which creates this problem.

I checked another volume which had several bricks from the beginning, and there I see different mappings. This volume was with version 3.2.5, but I just confirmed with 3.3.0 that creating directories on a 3-brick distributed volume produces different mappings.

So, at the moment a rebalance operation sets the same mapping intervals
for all directories instead of shuffling them. Thus, it cannot achieve a
proper distribution of the files, which I would consider a bug (at least I hope it's not a feature).

Should I then file a bug report? Or what's the best way to proceed?

Cheers,
   Jochen

PS: Also for me identical mappings with taking into account the path
for the hash would seem favourable. It would also facilitate adjustments
of the intervals, e.g. to accomodate for different brick sizes (which seemed to complicated but it would be interesting for us).


On Tue, 14 Aug 2012, Jeff Darcy wrote:

On 08/14/2012 03:44 AM, Jochen Klein wrote:
Looking at the implementation in the dht translator and checking
calculated hashes it seems that only the basename is used for the hash
calculation of a given file. With all directories having the same
mappings for the hash intervals to bricks, this would explain our
observation if only this file hash is used. However, I also see hashes
calculated for directories but it's not clear to me for what they are
used?

Do I miss something here? Is this behaviour intended? Is there a
(supported) way to still distribute the files homogeneously to all
bricks? E.g. by using the full path for the hashing (which is actually
what I understood from the manual), or by shuffling the hash intervals
per directory?

I tripped over the same issue a while ago.  Yes, the file hashes use only the
basename.  However, it's not (or at least shouldn't be) true that all
directories have the same mappings for the hash intervals.  The same ranges are
used, but rotated into different orders.  So, using letters for hash values,
different directories might have:

        A-I on brick1, J-R on brick2, S-Z on brick3
        A-I on brick2, J-R on brick3, S-Z on brick1
        A-I on brick3, J-R on brick1, S-Z on brick2

I just ran a quick test creating a bunch of directories on a simple two-brick
distributed volume.  Sure enough, about half of the directories got one order,
and the other half got another.  If this isn't working the same way for you
(check using "getxattr -x -n trusted.glusterfs.dht" on each per-brick copy of
each directory) then it's probably a bug and we'll have to figure out why.

Personally, I'd prefer if all directories *did* have the same hash layout, so
that those layouts could be inherited instead of having to be set separately on
each and every directory of a potentially-petabyte volume.  That would require
that the hash include some directory-specific value (such as the parent GFID)
as well as the basename, but that seems a small price to pay.  In other words,
though right now it's a bit non-obvious how the layouts and hashing work, some
day they might work as you (and I) had expected.

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to