[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Marc Spaggiari updated HBASE-18451: ---------------------------------------- Description: If you run a big job every 4 hours, impacting many tables (they have 150 regions per server), ad the end all the regions might have some data to be flushed, and we want, after one hour, trigger a periodic flush. That's totally fine. Now, to avoid a flush storm, when we detect a region to be flushed, we add a "randomDelay" to the delayed flush, that way we spread them away. RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, which is very good. However, because we don't check if there is already a request in the queue, 10 seconds after, we create a new request, with a new randomDelay. If you generate a randomDelay every 10 seconds, at some point, you will end up having a small one, and the flush will be triggered almost immediatly. As a result, instead of spreading all the flush within the next 5 minutes, you end-up getting them all way more quickly. Like within the first minute. Which not only feed the queue to to many flush requests, but also defeats the purpose of the randomDelay. {code} @Override protected void chore() { final StringBuffer whyFlush = new StringBuffer(); for (Region r : this.server.onlineRegions.values()) { if (r == null) continue; if (((HRegion)r).shouldFlush(whyFlush)) { FlushRequester requester = server.getFlushRequester(); if (requester != null) { long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + MIN_DELAY_TIME; LOG.info(getName() + " requesting flush of " + r.getRegionInfo().getRegionNameAsString() + " because " + whyFlush.toString() + " after random delay " + randomDelay + "ms"); //Throttle the flushes by putting a delay. If we don't throttle, and there //is a balanced write-load on the regions in a table, we might end up //overwhelming the filesystem with too many flushes at once. requester.requestDelayedFlush(r, randomDelay, false); } } } } {code} {code} 2017-07-24 18:44:33,338 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 270785ms 2017-07-24 18:44:43,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 200143ms 2017-07-24 18:44:53,954 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 191082ms 2017-07-24 18:45:03,528 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 92532ms 2017-07-24 18:45:14,201 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 238780ms 2017-07-24 18:45:24,195 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 35390ms 2017-07-24 18:45:33,362 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 283034ms 2017-07-24 18:45:43,933 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 84328ms 2017-07-24 18:45:53,866 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 72291ms 2017-07-24 18:46:03,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 62658ms 2017-07-24 18:46:14,084 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 87491ms 2017-07-24 18:46:23,538 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 290170ms 2017-07-24 18:46:33,353 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 228541ms 2017-07-24 18:46:43,359 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 211814ms 2017-07-24 18:46:54,150 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 155167ms 2017-07-24 18:47:03,546 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 115318ms 2017-07-24 18:47:13,350 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 216662ms 2017-07-24 18:47:23,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 96018ms 2017-07-24 18:47:33,891 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 144751ms 2017-07-24 18:47:43,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 152461ms 2017-07-24 18:47:53,832 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 97352ms 2017-07-24 18:48:03,348 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 263910ms 2017-07-24 18:48:13,353 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 151793ms 2017-07-24 18:48:23,340 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 61926ms 2017-07-24 18:48:33,724 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 188901ms 2017-07-24 18:48:44,038 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 233148ms 2017-07-24 18:48:53,624 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 6206ms 2017-07-24 18:49:03,378 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 240946ms 2017-07-24 18:49:04,125 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing 1/1 column families, memstore=160 B 2017-07-24 18:49:04,182 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=6, memsize=160, hasBloomFilter=true, into tmp file hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/.tmp/3609340d58ae4ca190e62c3b0fe415f4 2017-07-24 18:49:04,208 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/f/3609340d58ae4ca190e62c3b0fe415f4, entries=1, sequenceid=6, filesize=1015 2017-07-24 18:49:04,212 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~160 B/160, currentsize=0 B/0 for region testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. in 87ms, sequenceid=6, compaction requested=false {code} was: If you run a big job every 4 hours, impacting many tables (they have 150 regions per server), ad the end all the regions might have some data to be flushed, and we want, after one hour, trigger a periodic flush. That's totally fine. Now, to avoid a flush storm, when we detect a region to be flushed, we add a "randomDelay" to the delayed flush, that way we spread them away. RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, which is very good. However, because we don't check if there is already a request in the queue, 10 seconds after, we create a new request, with a new randomDelay. If you generate a randomDelay every 10 seconds, at some point, you will end up having a small one, and the flush will be triggered almost immediatly. As a result, instead of spreading all the flush within the next 5 minutes, you end-up getting them all way more quickly. Like within the first minute. Which defeats the purpose of the randomDelay. {code} @Override protected void chore() { final StringBuffer whyFlush = new StringBuffer(); for (Region r : this.server.onlineRegions.values()) { if (r == null) continue; if (((HRegion)r).shouldFlush(whyFlush)) { FlushRequester requester = server.getFlushRequester(); if (requester != null) { long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + MIN_DELAY_TIME; LOG.info(getName() + " requesting flush of " + r.getRegionInfo().getRegionNameAsString() + " because " + whyFlush.toString() + " after random delay " + randomDelay + "ms"); //Throttle the flushes by putting a delay. If we don't throttle, and there //is a balanced write-load on the regions in a table, we might end up //overwhelming the filesystem with too many flushes at once. requester.requestDelayedFlush(r, randomDelay, false); } } } } {code} {code} 2017-07-24 18:44:33,338 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 270785ms 2017-07-24 18:44:43,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 200143ms 2017-07-24 18:44:53,954 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 191082ms 2017-07-24 18:45:03,528 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 92532ms 2017-07-24 18:45:14,201 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 238780ms 2017-07-24 18:45:24,195 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 35390ms 2017-07-24 18:45:33,362 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 283034ms 2017-07-24 18:45:43,933 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 84328ms 2017-07-24 18:45:53,866 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 72291ms 2017-07-24 18:46:03,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 62658ms 2017-07-24 18:46:14,084 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 87491ms 2017-07-24 18:46:23,538 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 290170ms 2017-07-24 18:46:33,353 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 228541ms 2017-07-24 18:46:43,359 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 211814ms 2017-07-24 18:46:54,150 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 155167ms 2017-07-24 18:47:03,546 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 115318ms 2017-07-24 18:47:13,350 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 216662ms 2017-07-24 18:47:23,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 96018ms 2017-07-24 18:47:33,891 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 144751ms 2017-07-24 18:47:43,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 152461ms 2017-07-24 18:47:53,832 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 97352ms 2017-07-24 18:48:03,348 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 263910ms 2017-07-24 18:48:13,353 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 151793ms 2017-07-24 18:48:23,340 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 61926ms 2017-07-24 18:48:33,724 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 188901ms 2017-07-24 18:48:44,038 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 233148ms 2017-07-24 18:48:53,624 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 6206ms 2017-07-24 18:49:03,378 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f has an old edit so flush to free WALs after random delay 240946ms 2017-07-24 18:49:04,125 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing 1/1 column families, memstore=160 B 2017-07-24 18:49:04,182 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=6, memsize=160, hasBloomFilter=true, into tmp file hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/.tmp/3609340d58ae4ca190e62c3b0fe415f4 2017-07-24 18:49:04,208 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/f/3609340d58ae4ca190e62c3b0fe415f4, entries=1, sequenceid=6, filesize=1015 2017-07-24 18:49:04,212 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~160 B/160, currentsize=0 B/0 for region testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. in 87ms, sequenceid=6, compaction requested=false {code} > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -------------------------------------------------------------------------------------- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 2.0.0-alpha-1 > Reporter: Jean-Marc Spaggiari > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 283034ms > 2017-07-24 18:45:43,933 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 84328ms > 2017-07-24 18:45:53,866 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 72291ms > 2017-07-24 18:46:03,329 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 62658ms > 2017-07-24 18:46:14,084 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 87491ms > 2017-07-24 18:46:23,538 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 290170ms > 2017-07-24 18:46:33,353 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 228541ms > 2017-07-24 18:46:43,359 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 211814ms > 2017-07-24 18:46:54,150 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 155167ms > 2017-07-24 18:47:03,546 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 115318ms > 2017-07-24 18:47:13,350 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 216662ms > 2017-07-24 18:47:23,329 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 96018ms > 2017-07-24 18:47:33,891 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 144751ms > 2017-07-24 18:47:43,329 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 152461ms > 2017-07-24 18:47:53,832 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 97352ms > 2017-07-24 18:48:03,348 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 263910ms > 2017-07-24 18:48:13,353 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 151793ms > 2017-07-24 18:48:23,340 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 61926ms > 2017-07-24 18:48:33,724 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 188901ms > 2017-07-24 18:48:44,038 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 233148ms > 2017-07-24 18:48:53,624 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 6206ms > 2017-07-24 18:49:03,378 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 240946ms > 2017-07-24 18:49:04,125 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Flushing 1/1 column families, memstore=160 B > 2017-07-24 18:49:04,182 INFO > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, > sequenceid=6, memsize=160, hasBloomFilter=true, into tmp file > hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/.tmp/3609340d58ae4ca190e62c3b0fe415f4 > 2017-07-24 18:49:04,208 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added > hdfs://hbasetest1.domainname.com:8020/hbase/data/default/testflush/578c27d2eb7ef0ad437bf2ff38c053ae/f/3609340d58ae4ca190e62c3b0fe415f4, > entries=1, sequenceid=6, filesize=1015 > 2017-07-24 18:49:04,212 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~160 B/160, currentsize=0 B/0 for region > testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. in 87ms, > sequenceid=6, compaction requested=false > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)