[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
-------------------------------
    Attachment: key_stacktrace_hbase10882.TXT

Hi [~stack],
Sorry for confusing. let me explain from scratch:
1)the root cause of problem - HRegion#lock.
>From the stacktrace in HBASE-10882(also see key_stacktrace_hbase10882.TXT 
>attached),  the event sequence is: 
1.1)the compaction acquires the readlock of HRegion#lock, 
1.2)the bulkload try to acquire the writelock of HRegion#lock if there are 
multiple CFs. it has to wait for compaction to release the readlock.
1.3)scanners try to acquire the readlock of HRegion#lock. they have to wait for 
the bulkload to release the writelock.
so both bulkload and scanners are blocked on HRegion#lock by compaction.

2)what is HRegion#lock used for?
Investigation on the HRegion#lock shows, it is originally designed to protect 
region close ONLY. if someone, such as region split, wants to close the region, 
it needs to wait for others release the readlock.  
Then HBASE-4552 used the lock to solve the multi-CF bulkload consistency issue. 
now we see it is too heavy.

3)can we not use HRegion#lock in bulkload?
the answer is yes. 
Internally, HStore#DefaultStoreFileManager#storefiles keeps track of the 
on-disk HFiles for a CF. we have below steps for the bulkload:
3.1)moves HFiles directly to region directory
3.2)add them into the {{storefiles}} list
3.3)notify StoreScanner that the HFile list is changed, which is done by 
resetting the StoreScanner#heap to null. this forces existing StoreScanner 
instances to reinitialize based on new the HFiles seen on disk in next 
scan/read request.
the step 3.2 and 3.3 is synchronized by HStore#lock. so we have CF level 
scan-bulkload consistency.
 
To achieve multi-CF scan-bulkload consistency, if we do not use HRegion#lock, 
we still need another region level lock --- a RegionScanner is composed of 
multiple StoreScanner, a StoreScanner(a CF scanner) is composed of a 
MemStoreScanner and multiple StoreFileScanner.

the RegionScannerImpl#sortheap(and joinedHeap) is just the entry point of 
multiple StoreScanners. to have multi-CF consistency, we need synchronization 
here - a lock is needed, but it is used only between scan and bulkload.



Regarding the code change you referenced, 
performance_improvement_verification_98.5.patch is to simulate the event 
sequence described in #1, for testing purpose only.

currently I use 98.5 for test since it is stable and easy to evaluate the 
effect of the change.
thanks.









> Multi-column family BulkLoad fails if compactions go on too long
> ----------------------------------------------------------------
>
>                 Key: HBASE-11368
>                 URL: https://issues.apache.org/jira/browse/HBASE-11368
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Qiang Tian
>         Attachments: hbase-11368-0.98.5.patch, key_stacktrace_hbase10882.TXT, 
> performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to