I think I may find out the problem:
1, When split region, it will close parent region, and set
writestate.writesEnabled = false:
private List<StoreFile> doClose(final boolean abort)
throws IOException {
synchronized (writestate) {
// Disable compacting and flushing by background threads for this
// region.
writestate.writesEnabled = false;
2, If the memstore is large enouth, preflush will happen:
if (!abort && !wasFlushing && worthPreFlushing()) {
LOG.info("Running close preflush of " + this.getRegionNameAsString());
internalFlushcache();
}
this.closing.set(true);
lock.writeLock().lock();
3, IOException happened, and preflushing failed, and closing parent failed:
createSplitDir(this.parent.getFilesystem(), this.splitdir);
this.journal.add(JournalEntry.CREATE_SPLIT_DIR);
List<StoreFile> hstoreFilesToSplit = this.parent.close(false);
if (hstoreFilesToSplit == null) {
4, roll back split is calling, but split state stay in "CREATE_SPLIT_DIR", so ,
only clenupSplitDir will happen.
while (iterator.hasPrevious()) {
JournalEntry je = iterator.previous();
switch(je) {
case CREATE_SPLIT_DIR:
cleanupSplitDir(fs, this.splitdir);
break;
case CLOSED_PARENT_REGION:
5, what about writestate.writesEnabled? it stayed in false, no one handle it.
So, even split is roll back, but no flush can success in parent region.
Zhou Shuaifeng(Frank)
-----邮件原件-----
发件人: [email protected] [mailto:[email protected]] 代表 Stack
发送时间: 2011年4月26日 11:41
收件人: [email protected]
抄送: Yanlijun
主题: Re: "NOT flushing memstore for region" keep on printing for half an hour
2011/4/25 Zhoushuaifeng <[email protected]>:
> Thanks St,
> I found that when running closing parent region to prepare for split,
> preflush occurred. But for some reason, an IOException throwed from
> hdfsclient, and caused preflush failed. Then split failed.
> We have made some change on HDFS and will check why Exception happened.
> But I don't know details on how hbase handle the exception. When rolling back
> the split operation, will it reset writestate.writesEnabled to true?
> If not, it will hanging writestate.writesEnabled to false and causing all
> other flush operations false.
>
If we fail a split, we rollback. Usually the rollback is clean (There
is a unit test that exercises various failures splitting verifying
rollback works) but for sure there could be bugs in here.
I would be interested in the log of that regionserver.
Thanks,
St.Ack