Hi, I have been working on setting up a 4 replica gluster with over a million files (~250GB total), and I've seen some really weird stuff happen, even after trying to optimize for small files. I've set up a 4-brick replicate volume (gluster 3.13.2).
It took almost 2 days to rsync the data from the local drive to the gluster volume, and now I'm running a 2nd rsync that just looks for changes in case more files have been written. I'd like to concentrate this email on a very specific and odd issue. The dir structure is YYYY/ MM/ 10k+files in each month folder rsyncing each month folder cold can take 2+ minutes. However, if I ls the destination folder first, or use find (both of which finish within 5 seconds), the rsync is almost instant. Here's a log with time calls that shows you what happens.: box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/ sending incremental file list ^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(637) [sender=3.1.0] real 1m39.848s user 0m0.010s sys 0m0.030s box:/mnt/gluster/uploads/2017 # time find 08 | wc -l 14254 real 0m0.726s user 0m0.013s sys 0m0.033s box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/ sending incremental file list real 0m0.562s user 0m0.057s sys 0m0.137s box:/mnt/gluster/uploads/2017 # time find 07 | wc -l 10103 real 0m4.550s user 0m0.010s sys 0m0.033s box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/07/ 07/ sending incremental file list real 0m0.428s user 0m0.030s sys 0m0.083s box:/mnt/gluster/uploads/2017 # time ls 06 | wc -l 11890 real 0m1.850s user 0m0.077s sys 0m0.040s box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/06/ 06/ sending incremental file list real 0m0.627s user 0m0.073s sys 0m0.107s box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/05/ 05/ sending incremental file list real 2m24.382s user 0m0.127s sys 0m0.357s Note how if I precede the rsync call with ls or find, the rsync completes in less than a second (finding no files to sync because they've already been synced). Otherwise, it takes over 2 minutes (I interrupted the first call before the 2 minutes because it was already taking too long). What could be causing rsync to work so slowly unless the dir is primed? Volume config: Volume Name: gluster Type: Replicate Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXX Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: server1 :/mnt/server1_block4/gluster Brick2: server2 :/mnt/server2_block4/gluster Brick3: server3 :/mnt/server3_block4/gluster Brick4: server4 :/mnt/server4_block4/gluster Options Reconfigured: performance.parallel-readdir: off transport.address-family: inet nfs.disable: on cluster.self-heal-daemon: enable performance.cache-size: 1GB network.ping-timeout: 5 cluster.quorum-type: fixed cluster.quorum-count: 1 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 500000 performance.rda-cache-limit: 256MB performance.read-ahead: off client.event-threads: 4 server.event-threads: 4 Thank you for any insight. Sincerely, Artem
_______________________________________________ Gluster-users mailing list Glusterfirstname.lastname@example.org http://lists.gluster.org/mailman/listinfo/gluster-users