[ 
https://issues.apache.org/jira/browse/HBASE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091256#comment-16091256
 ] 

Anastasia Braginsky commented on HBASE-18375:
---------------------------------------------

Hey, [~ram_krish]! No problems, here are more explanations:

bq. So you say that the ByteBuffer inside the 'Chunk' becomes null and so 
referencing the BB inside the chunk gives you NPE or is it the Chunk itself 
being NULL? Can you paste the stack trace here?
What I see is the following scenario (I am running with CellChunkMap):

1. Chunk C is allocated from pool and is used as part of the Segment S. S is 
currently part of the compaction pipeline. C is protected with strong pointer 
as it is data chunk of the CellChunkMap. 
2. Due to the snapshot of the pipeline, segment S is swapped out of the 
pipeline.
3. S is closed, C is removed from strong map and is not referenced from 
anywhere.
4. C is returned to the pool, but in parallel the GC is already freeing C.
5. As a result C's chunk ID is entered to weak map, but it references to null...

So I am getting null when I try to translate C's chunk ID. I will copy paste 
here the stack, but it has heavy printouts all around. It was intensive 
debugging till I understood that scenario.
{code}
2017-07-16 16:03:31,109 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902]
 regionserver.CompactingMemStore: IN-MEMORY FLUSH: Pushing active segment into 
compaction pipeline [Region: 
usertable,user4599,1500212802830.bf1a03cc3ca0f1788720512a8e9275d0., Store: 
values, values]
2017-07-16 16:03:31,109 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902]
 regionserver.MemStoreCompactor: Starting the In-Memory Compaction for store 
values
2017-07-16 16:03:31,109 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902]
 regionserver.MemStoreCompactor: The youngest segment in the in-Memory 
Compaction Pipeline for store values is going to be flattened to the CHUNK_MAP
2017-07-16 16:03:31,225 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902]
 regionserver.CellChunkImmutableSegment: Number of new index chunks 3. The old 
data chunks saved while flattening [1, 3, 15, 20, 32, 50, 51, 67, 72, 73, 75, 
96, 121, 132, 155, 197, 199]
2017-07-16 16:03:31,236 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] ipc.RpcServer: 
callId: 820212 service: ClientService methodName: Get size: 128 connection: 
10.73.119.93:44692 deadline: 9223372036854775807
java.io.IOException: java.lang.IllegalArgumentException:
 <<< In CellChunkMap, cell must be associated with chunk with chunk ID 199 read 
from offset 947764. We were looking for a cell at global index 47388 and index 
inside chunk 47388, NUM_OF_CELL_REPS_IN_CHUNK: 104857.
 <<< The index chunk in chunk array is Chunk@1992770427 allocs=0waste=2097148. 
Number of chunks in the creator maps: 200. The chunk ID counter: 201. Verifying 
once again week map: java.lang.ref.WeakReference@39f3b87f --> null.
 <<< WEAK MAP: {198=java.lang.ref.WeakReference@44f41b54, 
199=java.lang.ref.WeakReference@39f3b87f, 
200=java.lang.ref.WeakReference@4a68f3b9}.
 <<< STRONG MAP: {1=Chunk@116964628 allocs=13539waste=39, 2=Chunk@577555956 
allocs=13539waste=92, 3=Chunk@1185303638 allocs=13539waste=36, 
4=Chunk@1264486912 allocs=13547waste=151, 5=Chunk@143494754 
allocs=13541waste=52, 6=Chunk@1279980010 allocs=13548waste=48, 
7=Chunk@955957449 allocs=13539waste=50, 8=Chunk@122372956 allocs=13547waste=66, 
9=Chunk@1666265929 allocs=13541waste=121, 10=Chunk@2022660968 
allocs=13544waste=85, 11=Chunk@379396152 allocs=13539waste=57, 
12=Chunk@1259454484 allocs=13544waste=90, 13=Chunk@556824900 
allocs=13547waste=124, 14=Chunk@593895953 allocs=13547waste=107, 
15=Chunk@1552239269 allocs=13539waste=58, 16=Chunk@424747073 
allocs=13539waste=98, 17=Chunk@998867062 allocs=13540waste=126, 
18=Chunk@695032764 allocs=13541waste=79, 19=Chunk@1598815318 
allocs=13540waste=146, 20=Chunk@1334349014 allocs=13539waste=105, 
21=Chunk@947203937 allocs=13547waste=82, 22=Chunk@2064003944 
allocs=13544waste=102, 23=Chunk@2075121938 allocs=13541waste=20, 
24=Chunk@1892287629 allocs=13544waste=119, 25=Chunk@1641362386 
allocs=13539waste=146, 26=Chunk@721601011 allocs=13544waste=123, 
27=Chunk@778107592 allocs=12waste=2095291, 28=Chunk@237923813 
allocs=13541waste=26, 29=Chunk@1177388113 allocs=13547waste=108, 
30=Chunk@2073942078 allocs=13539waste=130, 31=Chunk@1384303423 
allocs=13544waste=68, 32=Chunk@801031631 allocs=13539waste=81, 
33=Chunk@94094181 allocs=13545waste=16, 34=Chunk@282245056 
allocs=13541waste=65, 35=Chunk@1414356438 allocs=13548waste=7, 
36=Chunk@218397229 allocs=13540waste=17, 37=Chunk@1449066499 
allocs=0waste=2097148, 38=Chunk@1404490175 allocs=13547waste=115, 
39=Chunk@1831568882 allocs=13541waste=93, 40=Chunk@1129340776 
allocs=13545waste=31, 41=Chunk@1839271961 allocs=0waste=2097148, 
42=Chunk@150937403 allocs=13539waste=78, 43=Chunk@961159107 
allocs=13540waste=3, 44=Chunk@2015407261 allocs=13547waste=66, 
45=Chunk@501500504 allocs=0waste=2097148, 46=Chunk@1838319640 
allocs=13539waste=143, 47=Chunk@72300142 allocs=13540waste=149, 
48=Chunk@2046178878 allocs=13544waste=100, 49=Chunk@1182733778 
allocs=0waste=2097148, 50=Chunk@1062508365 allocs=13539waste=87, 
51=Chunk@1121266319 allocs=13539waste=29, 52=Chunk@182100781 
allocs=0waste=2097148, 53=Chunk@875951649 allocs=13539waste=80, 
54=Chunk@792781077 allocs=13541waste=57, 55=Chunk@1143613216 
allocs=13545waste=7, 56=Chunk@456533827 allocs=13545waste=25, 57=Chunk@2707194 
allocs=13541waste=36, 58=Chunk@684281763 allocs=13539waste=83, 
59=Chunk@1037004477 allocs=13540waste=112, 60=Chunk@401164920 
allocs=13548waste=57, 61=Chunk@1576900833 allocs=13547waste=125, 
62=Chunk@1068196010 allocs=13545waste=32, 63=Chunk@2044156772 
allocs=13539waste=127, 64=Chunk@615543980 allocs=13544waste=140, 
65=Chunk@364632651 allocs=13541waste=87, 66=Chunk@1903248998 
allocs=13541waste=22, 67=Chunk@842940247 allocs=13539waste=125, 
68=Chunk@361511080 allocs=13547waste=108, 69=Chunk@1580122576 
allocs=13539waste=94, 70=Chunk@630717957 allocs=13544waste=136, 
71=Chunk@1720756583 allocs=13544waste=80, 72=Chunk@304976997 
allocs=13539waste=86, 73=Chunk@975562673 allocs=13539waste=82, 
74=Chunk@542654758 allocs=13540waste=13, 75=Chunk@1130373083 
allocs=13539waste=69, 76=Chunk@1024468334 allocs=13541waste=30, 
77=Chunk@681395474 allocs=13541waste=81, 78=Chunk@1252941910 
allocs=13544waste=94, 79=Chunk@158543903 allocs=13540waste=112, 
80=Chunk@896818954 allocs=13541waste=17, 81=Chunk@1996700003 
allocs=13540waste=22, 82=Chunk@1318476549 allocs=13544waste=60, 
83=Chunk@1593145600 allocs=13548waste=32, 84=Chunk@1240396251 
allocs=13548waste=30, 85=Chunk@667419044 allocs=13540waste=13, 
86=Chunk@1543125035 allocs=13544waste=107, 87=Chunk@775059245 
allocs=13544waste=4, 88=Chunk@283912985 allocs=13541waste=37, 
89=Chunk@2001653121 allocs=13540waste=154, 90=Chunk@2113902627 
allocs=8003waste=857976, 91=Chunk@1078530990 allocs=1waste=2096994, 
92=Chunk@831282269 allocs=13547waste=154, 93=Chunk@816819496 
allocs=7766waste=894984, 94=Chunk@831522206 allocs=13539waste=141, 
95=Chunk@969712737 allocs=13544waste=116, 96=Chunk@1785821617 
allocs=13539waste=106, 97=Chunk@662327098 allocs=13541waste=42, 
98=Chunk@385286846 allocs=13544waste=61, 99=Chunk@1245987757 
allocs=13539waste=138, 100=Chunk@731449253 allocs=13544waste=96, 
101=Chunk@1584373363 allocs=18waste=2094380, 102=Chunk@1881380338 
allocs=13547waste=141, 103=Chunk@403892592 allocs=5169waste=1296563, 
104=Chunk@2069193208 allocs=10waste=2095598, 105=Chunk@1500243175 
allocs=13539waste=132, 106=Chunk@898328125 allocs=0waste=2097148, 
107=Chunk@979194005 allocs=0waste=2097148, 108=Chunk@1784134791 
allocs=0waste=2097148, 109=Chunk@1813852662 allocs=13539waste=98, 
110=Chunk@46733510 allocs=4076waste=1465872, 111=Chunk@1379936754 
allocs=13541waste=35, 112=Chunk@94635193 allocs=13545waste=60, 
113=Chunk@3344250 allocs=13539waste=106, 114=Chunk@1257648274 
allocs=13541waste=11, 115=Chunk@422219655 allocs=9waste=2095754, 
116=Chunk@1007604183 allocs=13544waste=91, 117=Chunk@605948411 
allocs=13544waste=51, 118=Chunk@1603788618 allocs=13539waste=120, 
119=Chunk@302785928 allocs=13544waste=71, 120=Chunk@1489423991 
allocs=13540waste=101, 121=Chunk@205735237 allocs=13539waste=39, 
122=Chunk@1833940488 allocs=13540waste=148, 123=Chunk@505212492 
allocs=13539waste=110, 124=Chunk@1892105870 allocs=13548waste=42, 
125=Chunk@1714955454 allocs=13548waste=9, 126=Chunk@1985421703 
allocs=13541waste=43, 127=Chunk@255909775 allocs=13545waste=18, 
128=Chunk@1186640807 allocs=13544waste=67, 129=Chunk@1627419162 
allocs=13539waste=135, 130=Chunk@773074084 allocs=13539waste=55, 
131=Chunk@656482894 allocs=13548waste=11, 132=Chunk@882215558 
allocs=11waste=2095443, 133=Chunk@1458024108 allocs=13545waste=40, 
134=Chunk@1503255000 allocs=13544waste=126, 135=Chunk@2005185527 
allocs=13539waste=42, 136=Chunk@1868416571 allocs=12waste=2095288, 
137=Chunk@77895602 allocs=13541waste=114, 138=Chunk@325376487 
allocs=0waste=2097148, 139=Chunk@1093524343 allocs=13539waste=131, 
140=Chunk@1086370167 allocs=13540waste=124, 141=Chunk@1626974170 
allocs=13541waste=22, 142=Chunk@1441259971 allocs=13541waste=62, 
143=Chunk@2059300459 allocs=9waste=2095753, 144=Chunk@1068507970 
allocs=13545waste=9, 145=Chunk@316314649 allocs=0waste=2097148, 
146=Chunk@2027525752 allocs=0waste=2097148, 147=Chunk@1906139710 
allocs=0waste=2097148, 148=Chunk@1334437244 allocs=13544waste=151, 
149=Chunk@162908872 allocs=13544waste=113, 150=Chunk@1030485968 
allocs=0waste=2097148, 151=Chunk@670957265 allocs=13539waste=100, 
152=Chunk@660557143 allocs=13541waste=59, 153=Chunk@1782014 
allocs=0waste=2097148, 154=Chunk@1647088218 allocs=13541waste=63, 
155=Chunk@711979960 allocs=13539waste=81, 156=Chunk@1996957271 
allocs=13539waste=120, 157=Chunk@776103049 allocs=13547waste=130, 
158=Chunk@679124313 allocs=13539waste=79, 159=Chunk@2103265757 
allocs=13544waste=137, 160=Chunk@866577855 allocs=10waste=2095604, 
161=Chunk@251507054 allocs=10waste=2095598, 162=Chunk@1913742781 
allocs=13539waste=121, 163=Chunk@1992770427 allocs=0waste=2097148, 
164=Chunk@1704315267 allocs=13547waste=136, 165=Chunk@1641683867 
allocs=13540waste=102, 166=Chunk@47165455 allocs=13539waste=75, 
167=Chunk@2072645990 allocs=13541waste=64, 168=Chunk@815269445 
allocs=13544waste=92, 169=Chunk@28296462 allocs=0waste=2097148, 
170=Chunk@2115977980 allocs=0waste=2097148, 171=Chunk@1437957496 
allocs=0waste=2097148, 172=Chunk@636482469 allocs=13539waste=54, 
173=Chunk@1511511459 allocs=0waste=2097148, 174=Chunk@1578853567 
allocs=0waste=2097148, 175=Chunk@549166265 allocs=0waste=2097148, 
176=Chunk@2048078288 allocs=13541waste=43, 177=Chunk@516774396 
allocs=0waste=2097148, 178=Chunk@469166086 allocs=0waste=2097148, 
179=Chunk@1733620117 allocs=0waste=2097148, 180=Chunk@78303518 
allocs=13545waste=54, 181=Chunk@1885104846 allocs=13548waste=13, 
182=Chunk@1556486683 allocs=13544waste=123, 183=Chunk@484343887 
allocs=13539waste=99, 184=Chunk@1320059115 allocs=13544waste=55, 
185=Chunk@1572313578 allocs=13547waste=145, 186=Chunk@827147909 
allocs=0waste=2097148, 187=Chunk@1501131803 allocs=13540waste=140, 
188=Chunk@1676283399 allocs=8waste=2095908, 189=Chunk@41136985 
allocs=13541waste=10, 190=Chunk@1506915580 allocs=0waste=2097148, 
191=Chunk@1268435569 allocs=0waste=2097148, 192=Chunk@2016640819 
allocs=0waste=2097148, 193=Chunk@1741025815 allocs=0waste=2097148, 
194=Chunk@862440519 allocs=0waste=2097148, 195=Chunk@783499544 
allocs=13544waste=69, 196=Chunk@1293576474 allocs=13547waste=82, 
197=Chunk@92766180 allocs=13539waste=85}
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5954)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5911)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5875)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2827)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2807)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2789)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2783)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2504)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2439)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41137)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
{code}

bq. you see that some time you don't get the chunk ref itself and so it is not 
getting added to reclaimedChunks at all? And so next time you are not able to 
poll from the reclaimedchunks?
I see it is added back to the reclaimedChunks, but GC is already working on it 
in parallel. I mean it was already marked by GC, so further referencing to this 
memory is already not checked. And yes, it was still possible to poll the chunk 
from reclaimedChunks, but it was deallocated while being in weakMap.

bq. If you are always going with strongChunkMap if there is a pool then that 
'saveFromGC' also can be avoided while using CellChunkMap if there was a pool 
already in place?
No, I do not think we can avoid saveFromGC() as even when pool is in place, we 
are not sure that all chunks come only from pool. As I can see, there is still 
'on-demand' allocation available, when we are above some threshold.

Is it all clearer now?

> The pool chunks from ChunkCreator are deallocated while in pool because there 
> is no reference to them
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18375
>                 URL: https://issues.apache.org/jira/browse/HBASE-18375
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0-alpha-1
>            Reporter: Anastasia Braginsky
>            Priority: Critical
>             Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2
>
>         Attachments: HBASE-18375-V01.patch, HBASE-18375-V02.patch, 
> HBASE-18375-V03.patch
>
>
> Because MSLAB list of chunks was changed to list of chunk IDs, the chunks 
> returned back to pool can be deallocated by JVM because there is no reference 
> to them. The solution is to protect pool chunks from GC by the strong map of 
> ChunkCreator introduced by HBASE-18010. Will prepare the patch today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to