[
https://issues.apache.org/jira/browse/HBASE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeongdae Kim updated HBASE-15871:
---------------------------------
Description:
Sometimes in our production hbase cluster, it takes a long time to finish
memstore flush.( for about more than 30 minutes)
the reason is that a memstore flusher thread calls
StoreScanner.updateReaders(), waits for acquiring a lock that store scanner
holds in StoreScanner.next() and backwardseek() in memstore scanner runs for a
long time.
I think that this condition could occur in reverse scan by the following
process.
1) create a reversed store scanner by requesting a reverse scan.
2) flush a memstore in the same HStore.
3) puts a lot of cells in memstore and memstore is almost full.
4) call the reverse scanner.next() and re-create all scanners in this store
because all scanners was already closed by 2)'s flush() and backwardseek() with
store's lastTop for all new scanners.
5) in this status, memstore is almost full by 2) and all cells in memstore have
sequenceID greater than this scanner's readPoint because of 2)'s flush(). this
condition causes searching all cells in memstore, and seekToPreviousRow()
repeatly seach cells that are already searched if a row has one column.
(described this in more detail in a attached file.)
6) flush a memstore again in the same HStore, and wait until 4-5) process
finished, to update store files in the same HStore after flusing.
I searched HBase jira. and found a similar issue. (HBASE-14497) but,
HBASE-14497's fix can't solve this issue because that fix just changed
recursive call to loop.
was:
Sometimes in our production hbase cluster, it takes a long time to finish
memstore flush.( for about more than 30 minutes)
the reason is that a memstore flusher thread calls
StoreScanner.updateReaders(), waits for acquiring a lock that store scanner
holds in StoreScanner.next() and backwardseek() in memstore scanner runs for a
long time.
I think that this condition could occur in reverse scan by the following
process.
1) create a reversed store scanner by requesting a reverse scan.
2) flush a memstore in the same HStore.
3) puts a lot of cells in memstore and memstore is almost full.
4) call the reverse scanner.next() and re-create all scanners in this store
because all scanners was already closed by 2)'s flush() and backwardseek() with
store's lastTop for all new scanners.
5) in this status, memstore is almost full by 2) and all cells in memstore have
sequenceID greater than this scanner's readPoint because of 2)'s flush(). this
condition causes searching all cells in memstore, and seekToPreviousRow()
repeatly seach cells that are already searched if a row has one column.
(described this in more detail in a attached file.)
6) flush a memstore again in the same HStore, and wait until 4-5) process
finished, to update store files in the same HStore after flusing.
I searched HBase jira. and found a similar issue. (HBASE-14497) but,
HBASE-14497's fix can't solve this issue because that fix just changed
recursive call to loop.
Sometimes in our production hbase cluster, it takes a long time to finish
memstore flush.( for about more than 30 minutes)
the reason is that a memstore flusher thread calls
StoreScanner.updateReaders(), waits for acquiring a lock that store scanner
holds in StoreScanner.next() and backwardseek() in memstore scanner runs for a
long time.
I think that this condition could occur in reverse scan by the following
process.
1) create a reversed store scanner by requesting a reverse scan.
2) flush a memstore in the same HStore.
3) puts a lot of cells in memstore and memstore is almost full.
4) call the reverse scanner.next() and re-create all scanners in this store
because all scanners was already closed by 2)'s flush() and backwardseek() with
store's lastTop for all new scanners.
5) in this status, memstore is almost full by 2) and all cells in memstore have
sequenceID greater than this scanner's readPoint because of 2)'s flush(). this
condition causes searching all cells in memstore, and seekToPreviousRow()
repeatly seach cells that are already searched if a row has one column.
(described this in more detail in a attached file.)
6) flush a memstore again in the same HStore, and wait until 4-5) process
finished, to update store files in the same HStore after flusing.
I searched HBase jira. and found a similar issue. (HBASE-14497) but,
HBASE-14497's fix can't solve this issue because that fix just changed
recursive call to loop.
> Memstore flush doesn't finish because of backwardseek() in memstore scanner.
> ----------------------------------------------------------------------------
>
> Key: HBASE-15871
> URL: https://issues.apache.org/jira/browse/HBASE-15871
> Project: HBase
> Issue Type: Bug
> Components: Scanners
> Affects Versions: 1.1.2
> Reporter: Jeongdae Kim
>
> Sometimes in our production hbase cluster, it takes a long time to finish
> memstore flush.( for about more than 30 minutes)
> the reason is that a memstore flusher thread calls
> StoreScanner.updateReaders(), waits for acquiring a lock that store scanner
> holds in StoreScanner.next() and backwardseek() in memstore scanner runs for
> a long time.
> I think that this condition could occur in reverse scan by the following
> process.
> 1) create a reversed store scanner by requesting a reverse scan.
> 2) flush a memstore in the same HStore.
> 3) puts a lot of cells in memstore and memstore is almost full.
> 4) call the reverse scanner.next() and re-create all scanners in this store
> because all scanners was already closed by 2)'s flush() and backwardseek()
> with store's lastTop for all new scanners.
> 5) in this status, memstore is almost full by 2) and all cells in memstore
> have sequenceID greater than this scanner's readPoint because of 2)'s
> flush(). this condition causes searching all cells in memstore, and
> seekToPreviousRow() repeatly seach cells that are already searched if a row
> has one column. (described this in more detail in a attached file.)
> 6) flush a memstore again in the same HStore, and wait until 4-5) process
> finished, to update store files in the same HStore after flusing.
> I searched HBase jira. and found a similar issue. (HBASE-14497) but,
> HBASE-14497's fix can't solve this issue because that fix just changed
> recursive call to loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)