[ https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052565#comment-16052565 ]
Paul Rogers commented on DRILL-5211: ------------------------------------ Thanks for the link to DRILL-1960. Yes, [~sphillips], if each operator had to do the replay, then the code in each operator becomes highly complex. The readers are already complex, so adding the necessary logic to replay a row adds even greater complexity. All this is clearly explained in DRILL-1960. The change proposed here is a bit different, however. The new version of set safe has a limit, as was described in DRILL-1960. But, the way we handle the overflow row is different. Here, we propose to handle the overflow row "automagically" as part of a common batch mutator mechanism. From the point of view of the reader, the code writes entire rows and asks if it can write another. The common "mutator" handles the fiddly bits with the overflow row, moving it from the "full" batch to a new, look-ahead, empty batch. Because the mechanism is common, the result is simpler reader code. (And, yes, this means changing the existing readers that evolved ad-hoc ways of writing to vectors...) [~jni] is also correct that the focus here is on the scan operator as we've seen that it is the worst offender in creating large vectors. Was just looking at a case in which the JSON reader tried to create a single vector of size 8 TB. (4096 rows in which each row had an array of 1.8 GB in size. Clearly a pathological case...) Since we have to start somewhere, the readers are as good a place as any. Indeed, we must handle the other operators. That is an open issue, to be tackled as a follow on. For other operators, we need a slightly different design: one that works with the framework that code gen uses. The case you mention is a classic example: we have a VARCHAR column that filled its vector to 16 GB. Project concatenates the column with itself. As of today, this produces a 32 MB vector. So, we need to modify project from a 1-in-1-out model to a possible 1-in-n-out model. We probably won't use the same "mutator" as for readers (though, maybe we can; I've not looked into the issue enough yet.) Anyone know of existing mechanisms we could leverage and/or extend to help with the non-scan operators? > Queries fail due to direct memory fragmentation > ----------------------------------------------- > > Key: DRILL-5211 > URL: https://issues.apache.org/jira/browse/DRILL-5211 > Project: Apache Drill > Issue Type: Bug > Reporter: Paul Rogers > Assignee: Paul Rogers > Fix For: 1.9.0 > > Attachments: ApacheDrillMemoryFragmentationBackground.pdf, > ApacheDrillVectorSizeLimits.pdf, EnhancedScanOperator.pdf, > ScanSchemaManagement.pdf > > > Consider a test of the external sort as follows: > * Direct memory: 3GB > * Input file: 18 GB, with one Varchar column of 8K width > The sort runs, spilling to disk. Once all data arrives, the sort beings to > merge the results. But, to do that, it must first do an intermediate merge. > For example, in this sort, there are 190 spill files, but only 19 can be > merged at a time. (Each merge file contains 128 MB batches, and only 19 can > fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit. > Yet, when loading batch xx, Drill fails with an OOM error. At that point, > total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} > in the {{Bits}} class in the JDK.) > It appears that Drill wants to allocate 58,257,868 bytes, but the > {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an > OOM. > The problem is that, at this point, the external sort should not ask the > system for more memory. The allocator for the external sort is at just > 1,192,350,366 before the allocation request. Plenty of spare memory should be > available, released when the in-memory batches were spilled to disk prior to > merging. Indeed, earlier in the run, the sort had reached a peak memory usage > of 2,710,716,416 bytes. This memory should be available for reuse during > merging, and is plenty sufficient to fill the particular request in question. -- This message was sent by Atlassian JIRA (v6.4.14#64029)