[
https://issues.apache.org/jira/browse/HBASE-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581442#comment-14581442
]
stack commented on HBASE-13877:
-------------------------------
[~enis] any comment on [~Apache9] remark?
Are dealing w/ above, I'm +1 on commit. I did not find incidence of the
original issue in my run after looking in all logs. In my case, I am seeing
double-assignment over a master restart.
2015-06-09 20:06:20,839 WARN
[c2020.halxg.cloudera.com,16000,1433905568816-GeneralBulkAssigner-1]
master.AssignmentManager: Assigning a region not in region states: {ENCODED =>
6fbe22ff15c2e5f2b207f79eaf8f382a, NAME =>
'IntegrationTestBigLinkedList,\xEB\x85\x1E\xB8Q\xEB\x85\x10,1433895189133.6fbe22ff15c2e5f2b207f79eaf8f382a.',
STARTKEY => '\xEB\x85\x1E\xB8Q\xEB\x85\x10', ENDKEY =>
'\xF5\xC2\x8F\x5C(\xF5\xC2\x80'}
Will open new 'My Struggle' issue when I have figured more why double-assign
and then in turn, why dataloss (I don't see how at mo -- will keep digging).
> Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL
> --------------------------------------------------------------------
>
> Key: HBASE-13877
> URL: https://issues.apache.org/jira/browse/HBASE-13877
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.1.1
>
> Attachments: hbase-13877_v1.patch, hbase-13877_v2-branch-1.1.patch
>
>
> ITBLL with 1.25B rows failed for me (and Stack as reported in
> https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)
>
> HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering.
> The root cause this time seems to be different. It is due to procedure based
> flush interrupting the flush request in case the procedure is cancelled from
> an exception elsewhere. This leaves the memstore snapshot intact without
> aborting the server. The next flush, then flushes the previous memstore with
> the current seqId (as opposed to seqId from the memstore snapshot). This
> creates an hfile with larger seqId than what its contents are. Previous
> behavior in 0.98 and 1.0 (I believe) is that after flush prepare and
> interruption / exception will cause RS abort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)