Re: [Gluster-devel] md-cache improvements

Michael Adam Tue, 16 Aug 2016 16:41:45 -0700

Hi all,

On 2016-08-15 at 22:39 -0400, Vijay Bellur wrote:
> Hi Poornima, Dan -
> 
> Let us have a hangout/bluejeans session this week to discuss the planned
> md-cache improvements, proposed timelines and sort out open questions if
> any.


Because the initial mail creates the impression that this is
a topic that people are merely discussing, let me point out
that it has actually moved way beyond that stage already:

Poornima has been working hard on these cache improvements
since late 2015 at least. (And desperately looking for review
and support since at least springtime..) See all her patches
that have now finally already gone into master recently
(e.g. http://review.gluster.org/#/c/12951/ for an old one
that has just been merged)
and all the patches that she has still up for review
(e.g. http://review.gluster.org/#/c/15002/ for a big one).

These changes were mainly motivated by samba-workloads,
since the chatty, md-heavy smb protocol is suffering most
notably from the lack of proper caching of this metadata.
The good news is that it recently started getting more
attention and we are seeing very, very promising performance
test results!
Full functional and regression testings are also underway.

Discussion the state of affairs in a real call
could be very useful indeed. Sometimes this can be
less awkward than using the list..

> Would 11:00 UTC on Wednesday work for everyone in the To: list?

Not on the To: list myself, but would work for me.. :-)
Although I have to admit it may really be very short notice for
some...

And since Poornima drove the project thus far, and was mainly
supported by Rajesh J and R.Talur from the gluster side for long
stretches of time, afaict, I think these three should be present
bare minimum.

Thanks - Michael


> On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote:
> > 
> > My comments inline.
> > 
> > Regards,
> > Poornima
> > 
> > ----- Original Message -----
> > > From: "Dan Lambright" <[email protected]>
> > > To: "Gluster Devel" <[email protected]>
> > > Sent: Wednesday, August 10, 2016 10:35:58 PM
> > > Subject: [Gluster-devel] md-cache improvements
> > > 
> > > 
> > > There have been recurring discussions within the gluster community to 
> > > build
> > > on existing support for md-cache and upcalls to help performance for small
> > > file workloads. In certain cases, "lookup amplification" dominates data
> > > transfers, i.e. the cumulative round trip times of multiple LOOKUPs from 
> > > the
> > > client mitigates benefits from faster backend storage.
> > > 
> > > To tackle this problem, one suggestion is to more aggressively utilize
> > > md-cache to cache inodes on the client than is currently done. The inodes
> > > would be cached until they are invalidated by the server.
> > > 
> > > Several gluster development engineers within the DHT, NFS, and Samba teams
> > > have been involved with related efforts, which have been underway for some
> > > time now. At this juncture, comments are requested from gluster 
> > > developers.
> > > 
> > > (1) .. help call out where additional upcalls would be needed to 
> > > invalidate
> > > stale client cache entries (in particular, need feedback from DHT/AFR
> > > areas),
> > > 
> > > (2) .. identify failure cases, when we cannot trust the contents of 
> > > md-cache,
> > > e.g. when an upcall may have been dropped by the network
> > 
> > Yes, this needs to be handled.
> > It can happen only when there is a one way disconnect, where the server 
> > cannot
> > reach client and notify fails. We can have a retry for the same until the 
> > cache
> > expiry time.
> > 
> > > 
> > > (3) .. point out additional improvements which md-cache needs. For 
> > > example,
> > > it cannot be allowed to grow unbounded.
> > 
> > This is being worked on, and will be targetted for 3.9
> > 
> > > 
> > > Dan
> > > 
> > > ----- Original Message -----
> > > > From: "Raghavendra Gowdappa" <[email protected]>
> > > > 
> > > > List of areas where we need invalidation notification:
> > > > 1. Any changes to xattrs used by xlators to store metadata (like dht 
> > > > layout
> > > > xattr, afr xattrs etc).
> > 
> > Currently, md-cache will negotiate(using ipc) with the brick, a list of 
> > xattrs
> > that it needs invalidation for. Other xlators can add the xattrs they are 
> > interested
> > in to the ipc. But then these xlators need to manage their own caching and 
> > processing
> > the invalidation request, as md-cache will be above all cluater xlators.
> > reference: http://review.gluster.org/#/c/15002/
> > 
> > > > 2. Scenarios where individual xlator feels like it needs a lookup. For
> > > > example failed directory creation on non-hashed subvol in dht during 
> > > > mkdir.
> > > > Though dht succeeds mkdir, it would be better to not cache this inode 
> > > > as a
> > > > subsequent lookup will heal the directory and make things better.
> > 
> > For this, these xlators can specify an indicator in the dict of
> > the fop cbk, to not cache. This should be fairly simple to implement.
> > 
> > > > 3. removing of files
> > 
> > When an unlink is issued from the mount point, the cache is invalidated.
> > 
> > > > 4. writev on brick (to invalidate read cache on client)
> > 
> > writev on brick from any other client will invalidate the metadata cache on 
> > all
> > the other clients.
> > 
> > > > 
> > > > Other questions:
> > > > 5. Does md-cache has cache management? like lru or an upper limit for
> > > > cache.
> > 
> > Currently md-cache doesn't have any cache-management, we will be targeting 
> > this
> > for 3.9
> > 
> > > > 6. Network disconnects and invalidating cache. When a network disconnect
> > > > happens we need to invalidate cache for inodes present on that brick as 
> > > > we
> > > > might be missing some notifications. Current approach of purging cache 
> > > > of
> > > > all inodes might not be optimal as it might rollback benefits of 
> > > > caching.
> > > > Also, please note that network disconnects are not rare events.
> > 
> > Network disconnects are handled to a minimal extent, where any brick down 
> > will
> > cause the whole of the cache to be invalidated. Invalidating only the list 
> > of
> > inodes that belong to that perticular brick will need the support from the
> > underlying cluster xlators.
> > 
> > > > 
> > > > regards,
> > > > Raghavendra
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > [email protected]
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > [email protected]
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
> _______________________________________________
> Gluster-devel mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-devel

signature.asc
Description: PGP signature

_______________________________________________
Gluster-devel mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] md-cache improvements

Reply via email to