Thanks for the info, Hari. Sorry about the bad gluster volume info, I grabbed that from a file not realizing it was out of date. Here's a current configuration showing the active hot tier:
[root@pod-sjc1-gluster1 ~]# gluster volume info Volume Name: gv0 Type: Tier Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 Status: Started Snapshot Count: 13 Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: pod-sjc1-gluster2:/data/hot_tier/gv0 Brick2: pod-sjc1-gluster1:/data/hot_tier/gv0 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 3 x 2 = 6 Brick3: pod-sjc1-gluster1:/data/brick1/gv0 Brick4: pod-sjc1-gluster2:/data/brick1/gv0 Brick5: pod-sjc1-gluster1:/data/brick2/gv0 Brick6: pod-sjc1-gluster2:/data/brick2/gv0 Brick7: pod-sjc1-gluster1:/data/brick3/gv0 Brick8: pod-sjc1-gluster2:/data/brick3/gv0 Options Reconfigured: performance.rda-low-wmark: 4KB performance.rda-request-size: 128KB storage.build-pgfid: on cluster.watermark-low: 50 performance.readdir-ahead: off cluster.tier-cold-compact-frequency: 86400 cluster.tier-hot-compact-frequency: 86400 features.ctr-sql-db-wal-autocheckpoint: 2500 cluster.tier-max-mb: 64000 cluster.tier-max-promote-file-size: 10485760 cluster.tier-max-files: 100000 cluster.tier-demote-frequency: 3600 server.allow-insecure: on performance.flush-behind: on performance.rda-cache-limit: 128MB network.tcp-window-size: 1048576 performance.nfs.io-threads: off performance.write-behind-window-size: 512MB performance.nfs.write-behind-window-size: 4MB performance.io-cache: on performance.quick-read: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 90000 performance.cache-size: 1GB server.event-threads: 10 client.event-threads: 10 features.barrier: disable transport.address-family: inet nfs.disable: on performance.client-io-threads: on cluster.lookup-optimize: on server.outstanding-rpc-limit: 2056 performance.stat-prefetch: on performance.cache-refresh-timeout: 60 features.ctr-enabled: on cluster.tier-mode: cache cluster.tier-compact: on cluster.tier-pause: off cluster.tier-promote-frequency: 1500 features.record-counters: on cluster.write-freq-threshold: 2 cluster.read-freq-threshold: 5 features.ctr-sql-db-cachesize: 262144 cluster.watermark-hi: 95 auto-delete: enable It will take some time to get the logs together, I need to strip out potentially sensitive info, will update with them when I have them. Any theories as to why the promotions / demotions only take place on one box but not both? -Tom On Thu, Jan 18, 2018 at 5:12 AM, Hari Gowtham <[email protected]> wrote: > Hi Tom, > > The volume info doesn't show the hot bricks. I think you have took the > volume info output before attaching the hot tier. > Can you send the volume info of the current setup where you see this issue. > > The logs you sent are from a later point in time. The issue is hit > earlier than the logs what is available in the log. I need the logs > from an earlier time. > And along with the entire tier logs, can you send the glusterd and > brick logs too? > > Rest of the comments are inline > > On Wed, Jan 10, 2018 at 9:03 PM, Tom Fite <[email protected]> wrote: > > I should add that additional testing has shown that only accessing files > is > > held up, IO is not interrupted for existing transfers. I think this > points > > to the heat metadata in the sqlite DB for the tier, is it possible that a > > table is temporarily locked while the promotion daemon runs so the calls > to > > update the access count on files are blocked? > > > > > > On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <[email protected]> wrote: > >> > >> The sizes of the files are extremely varied, there are millions of small > >> (<1 MB) files and thousands of files larger than 1 GB. > > The tier use case is for bigger size files. not the best for files of > smaller size. > That can end up hindering the IOs. > > >> > >> Attached is the tier log for gluster1 and gluster2. These are full of > >> "demotion failed" messages, which is also shown in the status: > >> > >> [root@pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status > >> Node Promoted files Demoted files Status > >> run time in h:m:s > >> --------- --------- --------- --------- > >> --------- > >> localhost 25940 0 in > progress > >> 112:21:49 > >> pod-sjc1-gluster2 0 2917154 in progress > >> 112:21:49 > >> > >> Is it normal to have promotions and demotions only happen on each server > >> but not both? > > No. its not normal. > > >> > >> Volume info: > >> > >> [root@pod-sjc1-gluster1 ~]# gluster volume info > >> > >> Volume Name: gv0 > >> Type: Distributed-Replicate > >> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 > >> Status: Started > >> Snapshot Count: 13 > >> Number of Bricks: 3 x 2 = 6 > >> Transport-type: tcp > >> Bricks: > >> Brick1: pod-sjc1-gluster1:/data/brick1/gv0 > >> Brick2: pod-sjc1-gluster2:/data/brick1/gv0 > >> Brick3: pod-sjc1-gluster1:/data/brick2/gv0 > >> Brick4: pod-sjc1-gluster2:/data/brick2/gv0 > >> Brick5: pod-sjc1-gluster1:/data/brick3/gv0 > >> Brick6: pod-sjc1-gluster2:/data/brick3/gv0 > >> Options Reconfigured: > >> performance.cache-refresh-timeout: 60 > >> performance.stat-prefetch: on > >> server.allow-insecure: on > >> performance.flush-behind: on > >> performance.rda-cache-limit: 32MB > >> network.tcp-window-size: 1048576 > >> performance.nfs.io-threads: on > >> performance.write-behind-window-size: 4MB > >> performance.nfs.write-behind-window-size: 512MB > >> performance.io-cache: on > >> performance.quick-read: on > >> features.cache-invalidation: on > >> features.cache-invalidation-timeout: 600 > >> performance.cache-invalidation: on > >> performance.md-cache-timeout: 600 > >> network.inode-lru-limit: 90000 > >> performance.cache-size: 4GB > >> server.event-threads: 16 > >> client.event-threads: 16 > >> features.barrier: disable > >> transport.address-family: inet > >> nfs.disable: on > >> performance.client-io-threads: on > >> cluster.lookup-optimize: on > >> server.outstanding-rpc-limit: 1024 > >> auto-delete: enable > >> > >> > >> # gluster volume status > >> Status of volume: gv0 > >> Gluster process TCP Port RDMA Port Online > >> Pid > >> > >> ------------------------------------------------------------ > ------------------ > >> Hot Bricks: > >> Brick pod-sjc1-gluster2:/data/ > >> hot_tier/gv0 49219 0 Y > >> 26714 > >> Brick pod-sjc1-gluster1:/data/ > >> hot_tier/gv0 49199 0 Y > >> 21325 > >> Cold Bricks: > >> Brick pod-sjc1-gluster1:/data/ > >> brick1/gv0 49152 0 Y > >> 3178 > >> Brick pod-sjc1-gluster2:/data/ > >> brick1/gv0 49152 0 Y > >> 4818 > >> Brick pod-sjc1-gluster1:/data/ > >> brick2/gv0 49153 0 Y > >> 3186 > >> Brick pod-sjc1-gluster2:/data/ > >> brick2/gv0 49153 0 Y > >> 4829 > >> Brick pod-sjc1-gluster1:/data/ > >> brick3/gv0 49154 0 Y > >> 3194 > >> Brick pod-sjc1-gluster2:/data/ > >> brick3/gv0 49154 0 Y > >> 4840 > >> Tier Daemon on localhost N/A N/A Y > >> 20313 > >> Self-heal Daemon on localhost N/A N/A Y > >> 32023 > >> Tier Daemon on pod-sjc1-gluster1 N/A N/A Y > >> 24758 > >> Self-heal Daemon on pod-sjc1-gluster2 N/A N/A Y > >> 12349 > >> > >> Task Status of Volume gv0 > >> > >> ------------------------------------------------------------ > ------------------ > >> There are no active volume tasks > >> > >> > >> On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <[email protected]> > wrote: > >>> > >>> Hi, > >>> > >>> Can you send the volume info, and volume status output and the tier > logs. > >>> And I need to know the size of the files that are being stored. > >>> > >>> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <[email protected]> wrote: > >>> > I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 > server > >>> > / 3 > >>> > bricks per server distributed replicated volume. > >>> > > >>> > I'm seeing IO get blocked across all client FUSE threads for 10 to 15 > >>> > seconds while the promotion daemon runs. I see the 'glustertierpro' > >>> > thread > >>> > jump to 99% CPU usage on both boxes when these delays occur and they > >>> > happen > >>> > every 25 minutes (my tier-promote-frequency setting). > >>> > > >>> > I suspect this has something to do with the heat database in sqlite, > >>> > maybe > >>> > something is getting locked while it runs the query to determine > files > >>> > to > >>> > promote. My volume contains approximately 18 million files. > >>> > > >>> > Has anybody else seen this? I suspect that these delays will get > worse > >>> > as I > >>> > add more files to my volume which will cause significant problems. > >>> > > >>> > Here are my hot tier settings: > >>> > > >>> > # gluster volume get gv0 all | grep tier > >>> > cluster.tier-pause off > >>> > cluster.tier-promote-frequency 1500 > >>> > cluster.tier-demote-frequency 3600 > >>> > cluster.tier-mode cache > >>> > cluster.tier-max-promote-file-size 10485760 > >>> > cluster.tier-max-mb 64000 > >>> > cluster.tier-max-files 100000 > >>> > cluster.tier-query-limit 100 > >>> > cluster.tier-compact on > >>> > cluster.tier-hot-compact-frequency 86400 > >>> > cluster.tier-cold-compact-frequency 86400 > >>> > > >>> > # gluster volume get gv0 all | grep threshold > >>> > cluster.write-freq-threshold 2 > >>> > cluster.read-freq-threshold 5 > >>> > > >>> > # gluster volume get gv0 all | grep watermark > >>> > cluster.watermark-hi 92 > >>> > cluster.watermark-low 75 > >>> > > >>> > _______________________________________________ > >>> > Gluster-users mailing list > >>> > [email protected] > >>> > http://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > >>> > >>> -- > >>> Regards, > >>> Hari Gowtham. > >> > >> > > > > > > -- > Regards, > Hari Gowtham. >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
