Adding csaba On Tue, Mar 6, 2018 at 9:09 AM, Raghavendra Gowdappa <[email protected]> wrote:
> +Csaba. > > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson <[email protected]> wrote: > >> Raghavendra, >> >> Thanks very much for your reply. >> >> I fixed our data corruption problem by disabling the volume >> performance.write-behind flag as you suggested, and simultaneously >> disabling caching in my client side mount command. >> > > Good to know it worked. Can you give us the output of > # gluster volume info > > We would like to debug the problem in write-behind. Some questions: > > 1. What version of Glusterfs are you using? > 2. Were you able to figure out whether its stale data or metadata that is > causing the issue? > > There have been patches merged in write-behind in recent past and one in > the works which address metadata consistency. Would like to understand > whether you've run into any of the already identified issues. > > regards, > Raghavendra > >> >> In very modest testing, the flock() case appears to me to work well - >> before it would corrupt the db within a few transactions. >> >> Testing using built in sqlite3 locks is better (fcntl range locks), >> but has some behavioral issues (probably just requires query retry >> when the file is locked). I'll research this more, although the test >> case is not critical to our use case. >> >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. >> >> I intend to set up tests that run much longer than a few minutes, to >> see if there are any longer term issues. Also, I want to experiment >> with data durability by killing various gluster server nodes during >> the tests. >> >> If anyone would like our test scripts, I can either tar them up and >> email them or put them in github - either is fine with me. (they rely >> on current builds of docker and docker-compose) >> >> Thanks again!! >> >> Paul >> >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa >> <[email protected]> wrote: >> > >> > >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson <[email protected]> wrote: >> >> >> >> Hi, >> >> >> >> tl;dr summary of below: flock() works, but what does it take to make >> >> sync()/fsync() work in a 3 node GFS cluster? >> >> >> >> I am under the impression that POSIX flock, POSIX >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all >> >> supported in cluster operations, such that in theory, SQLite3 should >> >> be able to atomically lock the file (or a subset of page), modify >> >> pages, flush the pages to gluster, then release the lock, and thus >> >> satisfy the ACID property that SQLite3 appears to try to accomplish on >> >> a local filesystem. >> >> >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, >> >> read, update loops, we discovered that we at least need to use flock() >> >> around the SQLite3 db connection open/update/close to protect it. >> >> >> >> However, that is not enough - although from testing, it looks like >> >> flock() works as advertised across gluster mounted files, sync/fsync >> >> don't appear to, so we end up getting corruption in the SQLite3 file >> >> (pragma integrity_check generally will show a bunch of problems after >> >> a short test). >> >> >> >> Is what we're trying to do achievable? We're testing using the docker >> >> container gluster/gluster-centos as the three servers, with a php test >> >> inside of php-cli using filesystem mounts. If we mount the gluster FS >> >> via sapk/plugin-gluster into the php-cli containers using docker, we >> >> seem to have better success sometimes, but I haven't figured out why, >> >> yet. >> >> >> >> I did see that I needed to set the server volume parameter >> >> 'performance.flush-behind off', otherwise it seems that flushes won't >> >> block as would be needed by SQLite3. >> > >> > >> > If you are relying on fsync this shouldn't matter as fsync makes sure >> data >> > is synced to disk. >> > >> >> >> >> Does anyone have any suggestions? Any words of widsom would be much >> >> appreciated. >> > >> > >> > Can you experiment with turning on/off various performance xlators? >> Based on >> > earlier issues, its likely that there is stale metadata which might be >> > causing the issue (not necessarily improper fsync behavior). I would >> suggest >> > turning off all performance xlators. You can refer [1] for a related >> > discussion. In theory the only perf xlator relevant for fsync is >> > write-behind and I am not aware of any issues where fsync is not >> working. >> > Does glusterfs log file has any messages complaining about writes or >> fsync >> > failing? Does your application use O_DIRECT? If yes, please note that >> you >> > need to turn the option performance.strict-o-direct on for write-behind >> to >> > honour O_DIRECT >> > >> > Also, is it possible to identify nature of corruption - Data or >> metadata? >> > More detailed explanation will help to RCA the issue. >> > >> > Also, is your application running on a single mount or from multiple >> mounts? >> > Can you collect strace of your application (strace -ff -T -p <pid> -o >> > <file>)? If possible can you also collect fuse-dump using option >> --dump-fuse >> > while mounting glusterfs? >> > >> > [1] >> > http://lists.gluster.org/pipermail/gluster-users/2018-Februa >> ry/033503.html >> > >> >> >> >> Thanks, >> >> >> >> Paul >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> [email protected] >> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
