[ https://issues.apache.org/jira/browse/HBASE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091256#comment-16091256 ]
Anastasia Braginsky commented on HBASE-18375: --------------------------------------------- Hey, [~ram_krish]! No problems, here are more explanations: bq. So you say that the ByteBuffer inside the 'Chunk' becomes null and so referencing the BB inside the chunk gives you NPE or is it the Chunk itself being NULL? Can you paste the stack trace here? What I see is the following scenario (I am running with CellChunkMap): 1. Chunk C is allocated from pool and is used as part of the Segment S. S is currently part of the compaction pipeline. C is protected with strong pointer as it is data chunk of the CellChunkMap. 2. Due to the snapshot of the pipeline, segment S is swapped out of the pipeline. 3. S is closed, C is removed from strong map and is not referenced from anywhere. 4. C is returned to the pool, but in parallel the GC is already freeing C. 5. As a result C's chunk ID is entered to weak map, but it references to null... So I am getting null when I try to translate C's chunk ID. I will copy paste here the stack, but it has heavy printouts all around. It was intensive debugging till I understood that scenario. {code} 2017-07-16 16:03:31,109 DEBUG [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902] regionserver.CompactingMemStore: IN-MEMORY FLUSH: Pushing active segment into compaction pipeline [Region: usertable,user4599,1500212802830.bf1a03cc3ca0f1788720512a8e9275d0., Store: values, values] 2017-07-16 16:03:31,109 DEBUG [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902] regionserver.MemStoreCompactor: Starting the In-Memory Compaction for store values 2017-07-16 16:03:31,109 DEBUG [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902] regionserver.MemStoreCompactor: The youngest segment in the in-Memory Compaction Pipeline for store values is going to be flattened to the CHUNK_MAP 2017-07-16 16:03:31,225 DEBUG [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020-inmemoryCompactions-1500216113902] regionserver.CellChunkImmutableSegment: Number of new index chunks 3. The old data chunks saved while flattening [1, 3, 15, 20, 32, 50, 51, 67, 72, 73, 75, 96, 121, 132, 155, 197, 199] 2017-07-16 16:03:31,236 DEBUG [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] ipc.RpcServer: callId: 820212 service: ClientService methodName: Get size: 128 connection: 10.73.119.93:44692 deadline: 9223372036854775807 java.io.IOException: java.lang.IllegalArgumentException: <<< In CellChunkMap, cell must be associated with chunk with chunk ID 199 read from offset 947764. We were looking for a cell at global index 47388 and index inside chunk 47388, NUM_OF_CELL_REPS_IN_CHUNK: 104857. <<< The index chunk in chunk array is Chunk@1992770427 allocs=0waste=2097148. Number of chunks in the creator maps: 200. The chunk ID counter: 201. Verifying once again week map: java.lang.ref.WeakReference@39f3b87f --> null. <<< WEAK MAP: {198=java.lang.ref.WeakReference@44f41b54, 199=java.lang.ref.WeakReference@39f3b87f, 200=java.lang.ref.WeakReference@4a68f3b9}. <<< STRONG MAP: {1=Chunk@116964628 allocs=13539waste=39, 2=Chunk@577555956 allocs=13539waste=92, 3=Chunk@1185303638 allocs=13539waste=36, 4=Chunk@1264486912 allocs=13547waste=151, 5=Chunk@143494754 allocs=13541waste=52, 6=Chunk@1279980010 allocs=13548waste=48, 7=Chunk@955957449 allocs=13539waste=50, 8=Chunk@122372956 allocs=13547waste=66, 9=Chunk@1666265929 allocs=13541waste=121, 10=Chunk@2022660968 allocs=13544waste=85, 11=Chunk@379396152 allocs=13539waste=57, 12=Chunk@1259454484 allocs=13544waste=90, 13=Chunk@556824900 allocs=13547waste=124, 14=Chunk@593895953 allocs=13547waste=107, 15=Chunk@1552239269 allocs=13539waste=58, 16=Chunk@424747073 allocs=13539waste=98, 17=Chunk@998867062 allocs=13540waste=126, 18=Chunk@695032764 allocs=13541waste=79, 19=Chunk@1598815318 allocs=13540waste=146, 20=Chunk@1334349014 allocs=13539waste=105, 21=Chunk@947203937 allocs=13547waste=82, 22=Chunk@2064003944 allocs=13544waste=102, 23=Chunk@2075121938 allocs=13541waste=20, 24=Chunk@1892287629 allocs=13544waste=119, 25=Chunk@1641362386 allocs=13539waste=146, 26=Chunk@721601011 allocs=13544waste=123, 27=Chunk@778107592 allocs=12waste=2095291, 28=Chunk@237923813 allocs=13541waste=26, 29=Chunk@1177388113 allocs=13547waste=108, 30=Chunk@2073942078 allocs=13539waste=130, 31=Chunk@1384303423 allocs=13544waste=68, 32=Chunk@801031631 allocs=13539waste=81, 33=Chunk@94094181 allocs=13545waste=16, 34=Chunk@282245056 allocs=13541waste=65, 35=Chunk@1414356438 allocs=13548waste=7, 36=Chunk@218397229 allocs=13540waste=17, 37=Chunk@1449066499 allocs=0waste=2097148, 38=Chunk@1404490175 allocs=13547waste=115, 39=Chunk@1831568882 allocs=13541waste=93, 40=Chunk@1129340776 allocs=13545waste=31, 41=Chunk@1839271961 allocs=0waste=2097148, 42=Chunk@150937403 allocs=13539waste=78, 43=Chunk@961159107 allocs=13540waste=3, 44=Chunk@2015407261 allocs=13547waste=66, 45=Chunk@501500504 allocs=0waste=2097148, 46=Chunk@1838319640 allocs=13539waste=143, 47=Chunk@72300142 allocs=13540waste=149, 48=Chunk@2046178878 allocs=13544waste=100, 49=Chunk@1182733778 allocs=0waste=2097148, 50=Chunk@1062508365 allocs=13539waste=87, 51=Chunk@1121266319 allocs=13539waste=29, 52=Chunk@182100781 allocs=0waste=2097148, 53=Chunk@875951649 allocs=13539waste=80, 54=Chunk@792781077 allocs=13541waste=57, 55=Chunk@1143613216 allocs=13545waste=7, 56=Chunk@456533827 allocs=13545waste=25, 57=Chunk@2707194 allocs=13541waste=36, 58=Chunk@684281763 allocs=13539waste=83, 59=Chunk@1037004477 allocs=13540waste=112, 60=Chunk@401164920 allocs=13548waste=57, 61=Chunk@1576900833 allocs=13547waste=125, 62=Chunk@1068196010 allocs=13545waste=32, 63=Chunk@2044156772 allocs=13539waste=127, 64=Chunk@615543980 allocs=13544waste=140, 65=Chunk@364632651 allocs=13541waste=87, 66=Chunk@1903248998 allocs=13541waste=22, 67=Chunk@842940247 allocs=13539waste=125, 68=Chunk@361511080 allocs=13547waste=108, 69=Chunk@1580122576 allocs=13539waste=94, 70=Chunk@630717957 allocs=13544waste=136, 71=Chunk@1720756583 allocs=13544waste=80, 72=Chunk@304976997 allocs=13539waste=86, 73=Chunk@975562673 allocs=13539waste=82, 74=Chunk@542654758 allocs=13540waste=13, 75=Chunk@1130373083 allocs=13539waste=69, 76=Chunk@1024468334 allocs=13541waste=30, 77=Chunk@681395474 allocs=13541waste=81, 78=Chunk@1252941910 allocs=13544waste=94, 79=Chunk@158543903 allocs=13540waste=112, 80=Chunk@896818954 allocs=13541waste=17, 81=Chunk@1996700003 allocs=13540waste=22, 82=Chunk@1318476549 allocs=13544waste=60, 83=Chunk@1593145600 allocs=13548waste=32, 84=Chunk@1240396251 allocs=13548waste=30, 85=Chunk@667419044 allocs=13540waste=13, 86=Chunk@1543125035 allocs=13544waste=107, 87=Chunk@775059245 allocs=13544waste=4, 88=Chunk@283912985 allocs=13541waste=37, 89=Chunk@2001653121 allocs=13540waste=154, 90=Chunk@2113902627 allocs=8003waste=857976, 91=Chunk@1078530990 allocs=1waste=2096994, 92=Chunk@831282269 allocs=13547waste=154, 93=Chunk@816819496 allocs=7766waste=894984, 94=Chunk@831522206 allocs=13539waste=141, 95=Chunk@969712737 allocs=13544waste=116, 96=Chunk@1785821617 allocs=13539waste=106, 97=Chunk@662327098 allocs=13541waste=42, 98=Chunk@385286846 allocs=13544waste=61, 99=Chunk@1245987757 allocs=13539waste=138, 100=Chunk@731449253 allocs=13544waste=96, 101=Chunk@1584373363 allocs=18waste=2094380, 102=Chunk@1881380338 allocs=13547waste=141, 103=Chunk@403892592 allocs=5169waste=1296563, 104=Chunk@2069193208 allocs=10waste=2095598, 105=Chunk@1500243175 allocs=13539waste=132, 106=Chunk@898328125 allocs=0waste=2097148, 107=Chunk@979194005 allocs=0waste=2097148, 108=Chunk@1784134791 allocs=0waste=2097148, 109=Chunk@1813852662 allocs=13539waste=98, 110=Chunk@46733510 allocs=4076waste=1465872, 111=Chunk@1379936754 allocs=13541waste=35, 112=Chunk@94635193 allocs=13545waste=60, 113=Chunk@3344250 allocs=13539waste=106, 114=Chunk@1257648274 allocs=13541waste=11, 115=Chunk@422219655 allocs=9waste=2095754, 116=Chunk@1007604183 allocs=13544waste=91, 117=Chunk@605948411 allocs=13544waste=51, 118=Chunk@1603788618 allocs=13539waste=120, 119=Chunk@302785928 allocs=13544waste=71, 120=Chunk@1489423991 allocs=13540waste=101, 121=Chunk@205735237 allocs=13539waste=39, 122=Chunk@1833940488 allocs=13540waste=148, 123=Chunk@505212492 allocs=13539waste=110, 124=Chunk@1892105870 allocs=13548waste=42, 125=Chunk@1714955454 allocs=13548waste=9, 126=Chunk@1985421703 allocs=13541waste=43, 127=Chunk@255909775 allocs=13545waste=18, 128=Chunk@1186640807 allocs=13544waste=67, 129=Chunk@1627419162 allocs=13539waste=135, 130=Chunk@773074084 allocs=13539waste=55, 131=Chunk@656482894 allocs=13548waste=11, 132=Chunk@882215558 allocs=11waste=2095443, 133=Chunk@1458024108 allocs=13545waste=40, 134=Chunk@1503255000 allocs=13544waste=126, 135=Chunk@2005185527 allocs=13539waste=42, 136=Chunk@1868416571 allocs=12waste=2095288, 137=Chunk@77895602 allocs=13541waste=114, 138=Chunk@325376487 allocs=0waste=2097148, 139=Chunk@1093524343 allocs=13539waste=131, 140=Chunk@1086370167 allocs=13540waste=124, 141=Chunk@1626974170 allocs=13541waste=22, 142=Chunk@1441259971 allocs=13541waste=62, 143=Chunk@2059300459 allocs=9waste=2095753, 144=Chunk@1068507970 allocs=13545waste=9, 145=Chunk@316314649 allocs=0waste=2097148, 146=Chunk@2027525752 allocs=0waste=2097148, 147=Chunk@1906139710 allocs=0waste=2097148, 148=Chunk@1334437244 allocs=13544waste=151, 149=Chunk@162908872 allocs=13544waste=113, 150=Chunk@1030485968 allocs=0waste=2097148, 151=Chunk@670957265 allocs=13539waste=100, 152=Chunk@660557143 allocs=13541waste=59, 153=Chunk@1782014 allocs=0waste=2097148, 154=Chunk@1647088218 allocs=13541waste=63, 155=Chunk@711979960 allocs=13539waste=81, 156=Chunk@1996957271 allocs=13539waste=120, 157=Chunk@776103049 allocs=13547waste=130, 158=Chunk@679124313 allocs=13539waste=79, 159=Chunk@2103265757 allocs=13544waste=137, 160=Chunk@866577855 allocs=10waste=2095604, 161=Chunk@251507054 allocs=10waste=2095598, 162=Chunk@1913742781 allocs=13539waste=121, 163=Chunk@1992770427 allocs=0waste=2097148, 164=Chunk@1704315267 allocs=13547waste=136, 165=Chunk@1641683867 allocs=13540waste=102, 166=Chunk@47165455 allocs=13539waste=75, 167=Chunk@2072645990 allocs=13541waste=64, 168=Chunk@815269445 allocs=13544waste=92, 169=Chunk@28296462 allocs=0waste=2097148, 170=Chunk@2115977980 allocs=0waste=2097148, 171=Chunk@1437957496 allocs=0waste=2097148, 172=Chunk@636482469 allocs=13539waste=54, 173=Chunk@1511511459 allocs=0waste=2097148, 174=Chunk@1578853567 allocs=0waste=2097148, 175=Chunk@549166265 allocs=0waste=2097148, 176=Chunk@2048078288 allocs=13541waste=43, 177=Chunk@516774396 allocs=0waste=2097148, 178=Chunk@469166086 allocs=0waste=2097148, 179=Chunk@1733620117 allocs=0waste=2097148, 180=Chunk@78303518 allocs=13545waste=54, 181=Chunk@1885104846 allocs=13548waste=13, 182=Chunk@1556486683 allocs=13544waste=123, 183=Chunk@484343887 allocs=13539waste=99, 184=Chunk@1320059115 allocs=13544waste=55, 185=Chunk@1572313578 allocs=13547waste=145, 186=Chunk@827147909 allocs=0waste=2097148, 187=Chunk@1501131803 allocs=13540waste=140, 188=Chunk@1676283399 allocs=8waste=2095908, 189=Chunk@41136985 allocs=13541waste=10, 190=Chunk@1506915580 allocs=0waste=2097148, 191=Chunk@1268435569 allocs=0waste=2097148, 192=Chunk@2016640819 allocs=0waste=2097148, 193=Chunk@1741025815 allocs=0waste=2097148, 194=Chunk@862440519 allocs=0waste=2097148, 195=Chunk@783499544 allocs=13544waste=69, 196=Chunk@1293576474 allocs=13547waste=82, 197=Chunk@92766180 allocs=13539waste=85} at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5954) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5911) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5875) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2827) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2807) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2789) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2783) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2504) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2439) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41137) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258) {code} bq. you see that some time you don't get the chunk ref itself and so it is not getting added to reclaimedChunks at all? And so next time you are not able to poll from the reclaimedchunks? I see it is added back to the reclaimedChunks, but GC is already working on it in parallel. I mean it was already marked by GC, so further referencing to this memory is already not checked. And yes, it was still possible to poll the chunk from reclaimedChunks, but it was deallocated while being in weakMap. bq. If you are always going with strongChunkMap if there is a pool then that 'saveFromGC' also can be avoided while using CellChunkMap if there was a pool already in place? No, I do not think we can avoid saveFromGC() as even when pool is in place, we are not sure that all chunks come only from pool. As I can see, there is still 'on-demand' allocation available, when we are above some threshold. Is it all clearer now? > The pool chunks from ChunkCreator are deallocated while in pool because there > is no reference to them > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-18375 > URL: https://issues.apache.org/jira/browse/HBASE-18375 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0-alpha-1 > Reporter: Anastasia Braginsky > Priority: Critical > Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18375-V01.patch, HBASE-18375-V02.patch, > HBASE-18375-V03.patch > > > Because MSLAB list of chunks was changed to list of chunk IDs, the chunks > returned back to pool can be deallocated by JVM because there is no reference > to them. The solution is to protect pool chunks from GC by the strong map of > ChunkCreator introduced by HBASE-18010. Will prepare the patch today. -- This message was sent by Atlassian JIRA (v6.4.14#64029)