siddharthteotia opened a new pull request #5619:
URL: https://github.com/apache/incubator-pinot/pull/5619


   This is a fix for issue being seen in 
https://github.com/apache/incubator-pinot/issues/5610
   
   The call stack for the OOM is `Caused by: java.lang.OutOfMemoryError: Direct 
buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:694) ~[?:1.8.0_252]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 
~[?:1.8.0_252]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
~[?:1.8.0_252]
        at 
org.apache.pinot.core.io.reader.impl.ChunkReaderContext.<init>(ChunkReaderContext.java:38)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:93)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
   
org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:32)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.operator.docvalsets.SingleValueSet.<init>(SingleValueSet.java:35)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.operator.blocks.SingleValueBlock.<init>(SingleValueBlock.java:41)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.segment.index.datasource.BaseDataSource.getNextBlock(BaseDataSource.java:105)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at org.apache.pinot.core.common.DataFetcher.<init>(DataFetcher.java:65) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.operator.ProjectionOperator.<init>(ProjectionOperator.java:46)
 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.plan.ProjectionPlanNode.run(ProjectionPlanNode.java:51) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:103) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:55) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
        at 
org.apache.pinot.core.plan.CombinePlanNode$1.callJob(CombinePlanNode.java:122) 
~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]`
   
   This implies that error happens essentially during execution setup phase 
when the **operators are being setup** and haven't begun the execution by 
passing data between them.
   
   As per @fx19880617 , the call stack seems to be consistently reproducible as 
soon as the client upgrades to 0.4.0 in production and runs a SELECT * query on 
segments created in 0.3.0. At the top of the stack, there is ChunkReaderContext 
which allocates a chunk and fails with direct memory OOM. @felixcheung verified 
that chunk size and numDocsPerChunk etc are same so it is the not case that in 
0.4.0, we are suddenly allocating more memory in the ChunkReaderContext and 
thus failing. 
   
   Either there is a leak or something else. 
   
   As part of PR https://github.com/apache/incubator-pinot/pull/5510, one 
change made to SingleValueSet (also part of the call stack) was that reader 
context is now being created in the call to constructor as opposed to during 
every read call. There are two implications of this:
   
   - Earlier since chunk reader context objects were created on a per call 
basis, they were essentially short lived and probably never made their way to 
perm gen. Thus were garbage collected thereby also leading to garbage 
collecting the direct byte buffer reference inside them and freeing up direct 
memory -- this is essentially how direct memory is freed up in JVM unless 
cleaner is called. Now since they are being allocated in the constructor, they 
have essentially become long-lived objects and are unlikely to be GCed as 
quickly as they were in the previous code. Thus, there will be memory pressure.
   
   - The second implication is worse and probably the actual root cause of the 
OOM reported by the customer. 
   
   See this code for creation of ProjectionOperator (note that it is part of 
call stack)
   
   `public ProjectionOperator(Map<String, DataSource> dataSourceMap, 
BaseOperator<DocIdSetBlock> docIdSetOperator) {
       _dataSourceMap = dataSourceMap;
       _dataBlockMap = new HashMap<>(dataSourceMap.size());
       for (Map.Entry<String, DataSource> entry : dataSourceMap.entrySet()) {
         _dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock());
       }
       _docIdSetOperator = docIdSetOperator;
       _dataBlockCache = new DataBlockCache(new DataFetcher(dataSourceMap));
     }`
   
   `_dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock());` creates a 
block by going down the path of nextBlock() -> SingleValueBlock -> 
SingleValueSet -> constructor -> create reader context (direct memory allocated 
with the new PR)
   
   We then create DataFetcher with dataSourceMap and this code again creates a 
block
   
   `dataSourceMap.get(column).nextBlock().getBlockValueSet()` thus essentially 
going down the same path again eventually creating SingleValueBlock -> 
SingleValueSet -> reader context. -> allocating direct memory
   
   So the memory is being allocated twice and somewhere in the middle of doing 
this for the DataFetcher for a given column, we fail with OOM
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to