siddharthteotia opened a new pull request #5619: URL: https://github.com/apache/incubator-pinot/pull/5619
This is a fix for issue being seen in https://github.com/apache/incubator-pinot/issues/5610 The call stack for the OOM is `Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) ~[?:1.8.0_252] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_252] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_252] at org.apache.pinot.core.io.reader.impl.ChunkReaderContext.<init>(ChunkReaderContext.java:38) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:93) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:32) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.operator.docvalsets.SingleValueSet.<init>(SingleValueSet.java:35) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.operator.blocks.SingleValueBlock.<init>(SingleValueBlock.java:41) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.segment.index.datasource.BaseDataSource.getNextBlock(BaseDataSource.java:105) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.common.DataFetcher.<init>(DataFetcher.java:65) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.operator.ProjectionOperator.<init>(ProjectionOperator.java:46) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.plan.ProjectionPlanNode.run(ProjectionPlanNode.java:51) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:103) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:55) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517] at org.apache.pinot.core.plan.CombinePlanNode$1.callJob(CombinePlanNode.java:122) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]` This implies that error happens essentially during execution setup phase when the **operators are being setup** and haven't begun the execution by passing data between them. As per @fx19880617 , the call stack seems to be consistently reproducible as soon as the client upgrades to 0.4.0 in production and runs a SELECT * query on segments created in 0.3.0. At the top of the stack, there is ChunkReaderContext which allocates a chunk and fails with direct memory OOM. @felixcheung verified that chunk size and numDocsPerChunk etc are same so it is the not case that in 0.4.0, we are suddenly allocating more memory in the ChunkReaderContext and thus failing. Either there is a leak or something else. As part of PR https://github.com/apache/incubator-pinot/pull/5510, one change made to SingleValueSet (also part of the call stack) was that reader context is now being created in the call to constructor as opposed to during every read call. There are two implications of this: - Earlier since chunk reader context objects were created on a per call basis, they were essentially short lived and probably never made their way to perm gen. Thus were garbage collected thereby also leading to garbage collecting the direct byte buffer reference inside them and freeing up direct memory -- this is essentially how direct memory is freed up in JVM unless cleaner is called. Now since they are being allocated in the constructor, they have essentially become long-lived objects and are unlikely to be GCed as quickly as they were in the previous code. Thus, there will be memory pressure. - The second implication is worse and probably the actual root cause of the OOM reported by the customer. See this code for creation of ProjectionOperator (note that it is part of call stack) `public ProjectionOperator(Map<String, DataSource> dataSourceMap, BaseOperator<DocIdSetBlock> docIdSetOperator) { _dataSourceMap = dataSourceMap; _dataBlockMap = new HashMap<>(dataSourceMap.size()); for (Map.Entry<String, DataSource> entry : dataSourceMap.entrySet()) { _dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock()); } _docIdSetOperator = docIdSetOperator; _dataBlockCache = new DataBlockCache(new DataFetcher(dataSourceMap)); }` `_dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock());` creates a block by going down the path of nextBlock() -> SingleValueBlock -> SingleValueSet -> constructor -> create reader context (direct memory allocated with the new PR) We then create DataFetcher with dataSourceMap and this code again creates a block `dataSourceMap.get(column).nextBlock().getBlockValueSet()` thus essentially going down the same path again eventually creating SingleValueBlock -> SingleValueSet -> reader context. -> allocating direct memory So the memory is being allocated twice and somewhere in the middle of doing this for the DataFetcher for a given column, we fail with OOM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
