I should add that additional testing has shown that only accessing files is held up, IO is not interrupted for existing transfers. I think this points to the heat metadata in the sqlite DB for the tier, is it possible that a table is temporarily locked while the promotion daemon runs so the calls to update the access count on files are blocked?
On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <[email protected]> wrote: > The sizes of the files are extremely varied, there are millions of small > (<1 MB) files and thousands of files larger than 1 GB. > > Attached is the tier log for gluster1 and gluster2. These are full of > "demotion failed" messages, which is also shown in the status: > > [root@pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status > Node Promoted files Demoted files Status > run time in h:m:s > --------- --------- --------- --------- > --------- > localhost 25940 0 in > progress 112:21:49 > pod-sjc1-gluster2 0 2917154 in progress > 112:21:49 > > Is it normal to have promotions and demotions only happen on each server > but not both? > > Volume info: > > [root@pod-sjc1-gluster1 ~]# gluster volume info > > Volume Name: gv0 > Type: Distributed-Replicate > Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 > Status: Started > Snapshot Count: 13 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: pod-sjc1-gluster1:/data/brick1/gv0 > Brick2: pod-sjc1-gluster2:/data/brick1/gv0 > Brick3: pod-sjc1-gluster1:/data/brick2/gv0 > Brick4: pod-sjc1-gluster2:/data/brick2/gv0 > Brick5: pod-sjc1-gluster1:/data/brick3/gv0 > Brick6: pod-sjc1-gluster2:/data/brick3/gv0 > Options Reconfigured: > performance.cache-refresh-timeout: 60 > performance.stat-prefetch: on > server.allow-insecure: on > performance.flush-behind: on > performance.rda-cache-limit: 32MB > network.tcp-window-size: 1048576 > performance.nfs.io-threads: on > performance.write-behind-window-size: 4MB > performance.nfs.write-behind-window-size: 512MB > performance.io-cache: on > performance.quick-read: on > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > network.inode-lru-limit: 90000 > performance.cache-size: 4GB > server.event-threads: 16 > client.event-threads: 16 > features.barrier: disable > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: on > cluster.lookup-optimize: on > server.outstanding-rpc-limit: 1024 > auto-delete: enable > > > # gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Hot Bricks: > Brick pod-sjc1-gluster2:/data/ > hot_tier/gv0 49219 0 Y > 26714 > Brick pod-sjc1-gluster1:/data/ > hot_tier/gv0 49199 0 Y > 21325 > Cold Bricks: > Brick pod-sjc1-gluster1:/data/ > brick1/gv0 49152 0 Y > 3178 > Brick pod-sjc1-gluster2:/data/ > brick1/gv0 49152 0 Y > 4818 > Brick pod-sjc1-gluster1:/data/ > brick2/gv0 49153 0 Y > 3186 > Brick pod-sjc1-gluster2:/data/ > brick2/gv0 49153 0 Y > 4829 > Brick pod-sjc1-gluster1:/data/ > brick3/gv0 49154 0 Y > 3194 > Brick pod-sjc1-gluster2:/data/ > brick3/gv0 49154 0 Y > 4840 > Tier Daemon on localhost N/A N/A Y > 20313 > Self-heal Daemon on localhost N/A N/A Y > 32023 > Tier Daemon on pod-sjc1-gluster1 N/A N/A Y > 24758 > Self-heal Daemon on pod-sjc1-gluster2 N/A N/A Y > 12349 > > Task Status of Volume gv0 > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <[email protected]> wrote: > >> Hi, >> >> Can you send the volume info, and volume status output and the tier logs. >> And I need to know the size of the files that are being stored. >> >> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <[email protected]> wrote: >> > I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server >> / 3 >> > bricks per server distributed replicated volume. >> > >> > I'm seeing IO get blocked across all client FUSE threads for 10 to 15 >> > seconds while the promotion daemon runs. I see the 'glustertierpro' >> thread >> > jump to 99% CPU usage on both boxes when these delays occur and they >> happen >> > every 25 minutes (my tier-promote-frequency setting). >> > >> > I suspect this has something to do with the heat database in sqlite, >> maybe >> > something is getting locked while it runs the query to determine files >> to >> > promote. My volume contains approximately 18 million files. >> > >> > Has anybody else seen this? I suspect that these delays will get worse >> as I >> > add more files to my volume which will cause significant problems. >> > >> > Here are my hot tier settings: >> > >> > # gluster volume get gv0 all | grep tier >> > cluster.tier-pause off >> > cluster.tier-promote-frequency 1500 >> > cluster.tier-demote-frequency 3600 >> > cluster.tier-mode cache >> > cluster.tier-max-promote-file-size 10485760 >> > cluster.tier-max-mb 64000 >> > cluster.tier-max-files 100000 >> > cluster.tier-query-limit 100 >> > cluster.tier-compact on >> > cluster.tier-hot-compact-frequency 86400 >> > cluster.tier-cold-compact-frequency 86400 >> > >> > # gluster volume get gv0 all | grep threshold >> > cluster.write-freq-threshold 2 >> > cluster.read-freq-threshold 5 >> > >> > # gluster volume get gv0 all | grep watermark >> > cluster.watermark-hi 92 >> > cluster.watermark-low 75 >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > [email protected] >> > http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Regards, >> Hari Gowtham. >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
