[
https://issues.apache.org/jira/browse/DRILL-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295678#comment-17295678
]
ASF GitHub Bot commented on DRILL-6547:
---------------------------------------
paul-rogers commented on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-791072405
Thanks for fixing this.
I suspect the problem here runs deeper than this particular fix. This is
Drill: nothing is simple, things build on hacks which build on undocumented
complexities. I suspect that is the issue here.
In particular, we need to know the rules for buffer ownership in UDFs which
is, of course, not documented. The only reasonable way to learn this is to run
a query, save the generated Java code and inspect the calls. Warning: the
generated code is a huge mess.
The key question is this: when you declare an `@Inject` variable (`buffer`
in your case), does the runtime provide an initial buffer? My guess is that it
does in the form of a "managed" buffer. (This just means that there is some
auto-free mechanism.) I believe that the buffer is allocated on each call, but
the generated code will say for sure. (The UDF code is copied into the
generated code.)
The function declares an `@Output`, in this case `out`. If this variable
provides a buffer, it would seem that ownership has to pass to the caller,
since the caller will copy that buffer into a value vector. My guess is that if
the buffer is "managed", it will be freed automatically at some point.
(Note that managed buffers can cause an OOM because of when they are freed:
I think it is during fragment shutdown, IIRC. I had to fix a reader bug related
to this issue.)
To summarize, it appears that the proper pattern in UDFs is:
* Declare an `@Inject` buffer which the runtime will provide.
* Resize the buffer if needed, using the return value from `reallocIfNeeded`.
* Use that buffer as the output buffer by setting it to `out.buffer` where
needed.
* Managed buffers are freed automatically, at some unknown point.
If this is true, there is no reason to update `buffer`, the proper pattern
is shown in
[DecimalAggrTypeFunctions1](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions1.java#L113)
(with my own comments):
``` java
// Inject means that the runtime will provide a default-size buffer on
each call
@Inject DrillBuf buffer;
// Holder for output. NOT initialized b the runtime.
@Output ${type.outputType}Holder out;
...
// Resize the provided buffer, replacing if necessary, and assigning
that
// to the output variable.
out.buffer = buffer.reallocIfNeeded(len);
// Using the now-valid output buffer, copy in any needed data.
out.buffer.setBytes(0, bytes);
// Upon exit, is the output buffer is now owned by caller? Must be,
or the output
// data would be immediately freed if the caller were to release the
injected
// buffer, and all hell would break loose.
```
As usual, there is no documentation of this, you just gotta know (or figure
it out every time.)
The buffer management happens in `BufferManagerImpl`, as shown in the JIRA
stack trace, which is called by `reallocIfNeeded`:
```
public DrillBuf replace(DrillBuf old, int newSize) {
if (managedBuffers.remove(old.memoryAddress()) == null) {
throw new IllegalStateException("Tried to remove unmanaged buffer.");
}
old.release(1);
return getManagedBuffer(newSize);
}
```
This is saying, "Hey, that buffer you gave me? It ain't one of mine." The
question is why. The theory in this bug is that we forgot to replace our
working buffer. Let's check that by doing a [search for
`reallocIfNeeded`](https://github.com/apache/drill/search?p=2&q=reallocIfNeeded).
We find many cases where we don't save the reallocated buffer back into the
`@Inject` variable. For
[example](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions1.java#L113):
``` java
out.buffer = buffer.reallocIfNeeded(len);
out.buffer.setBytes(0, bytes);
```
I *suspect*, but am not sure without running the code, that every call to
the UDF is issued a new managed `buffer`. So, I suspect that the problem will
only occur if, within a single function call, we:
* Call `reallocIfNeeded` and get a new buffer.
* Call `reallocIfNeeded` on the original (now unregistered) buffer, but only
if the previous call actually resized the buffer.
Looking at the code in this function, there are no loops, so there is no
ability to make the above mistake. Yet, many other functions in the file
changed here have the exact same code pattern introduced in this fix. Why does
the code work so often if many other functions *do not* follow the pattern
adopted here?
There are two possibilities:
* The generated code has a bug: it is not providing a new buffer on each
function invocation.
* The problem is not where we think it is: it is elsewhere. (The stack trace
in the JIRA ticket shows generated code, so we can't be sure which bit of code
is actually making the call.)
In particular, I'm not convinced that the unit test reproduces the error.
The unit test is different from the query in the JIRA ticket. A good practice
is to run the proposed test *before* any fix to make sure we can reproduce the
issue.
The proposed test (simplified):
``` sql
select concat(cast(null as varchar), EXPR$0) as c1
"from (values('foo'), ('bar'))
```
User's query (also simplified):
``` sql
select concat(BinaryValue, 'aaa') NewVarcharValue from ...;
```
What is different? It may not be obvious, but in the first query, we have a
`NULL` const and a column (value vector, which happens to be defined by a
`VALUES` clause with a constant). In the second, we have a column value and a
string constant. That string constant is not stored in a value vector, it is in
a temporary buffer (IIRC).
My hunch is that the problem is with the constant: it is causing something
to break. This is why so many of the functions in `StringFunctions.java` have
code like the proposed fix: they are also likely tested with constants. If the
problem is with constants, then it means that all other UDFs that work with
strings, but don't have the proposed fix, will fail when used with constants.
That is, sounds like we got us a bit of a problem; this fix is just the tip
of the iceberg.
How should we proceed? We can do just this fix, and fix other issues as
people encounter and report them. Or, we can figure out what is *really* going
on by looking at the generated code. To do that, modify the [Project
operator](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L279):
``` java
boolean saveCode = true; \\ change from false in the master branch
```
Set the variable to `true` as shown above (it will be `false` by default.)
Run a query like the user's. Code will be generated. By default I think it is
in `/tmp/drill/codegen/...`. Analyze what's going on with the `@Inject` buffer.
Pay particular attention to how constants are handled. What is causing us to
try to reallocate a non-managed buffer in this case?
Then, we can decide how to proceed:
* If the fix here is the correct one, make it in all UDFs that have the
wrong pattern. (See the search above, there are **many** such occurrences.) Add
a comment to the `reallocIfNeeded` method to explain the proper usage. Provide
a code snippet to show what should be done.
* If the problem is with constant handling, work out what should be done.
Fix it. Revert the `out.buffer = buffer = buffer.ralloc...` bits of code in
`StringFunctions.java` and retest to ensure constants work correctly.
BTW, in researching these issues, I found a number of other problems.
If the proposed fix is correct, then a particularly nasty buf is
[here](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/vector/src/main/java/org/apache/drill/exec/vector/ValueHolderHelper.java#L121)
where the new buffer is created internally and never returned.
The function we're trying to fix, `ConcatLeftNullInput`, has a bug:
``` java
public static class ConcatLeftNullInput implements DrillSimpleFunc {
...
@Override
public void eval() {
out.buffer = buffer.reallocIfNeeded( (left.end - left.start) +
(right.end - right.start));
```
The function is `...LeftNullInput` and, yet, we are asking for the
`left.end` and `left.start`. By definition these are zero. Why? Because the
left is `NULL`. In fact, if the left value is `NULL`, I'm not sure the `start`
and `end` values are even defined to be any particular value. (Amazing this
stuff works as well as it does.)
In the same file, this code is *just plain wrong*:
``` java
public static class Replace implements DrillSimpleFunc {
...
@Inject DrillBuf buffer;
@Output VarCharHolder out;
@Override
public void setup() {
buffer = buffer.reallocIfNeeded(8000);
}
```
See the bug? What if the new string is larger than 8K? Kaboom. What if the
string is 10 characters? Huge waste of memory. The above is what we call "crude
and not quite effective", it works most of the time, except when it doesn't.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> IllegalStateException: Tried to remove unmanaged buffer.
> --------------------------------------------------------
>
> Key: DRILL-6547
> URL: https://issues.apache.org/jira/browse/DRILL-6547
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.14.0
> Reporter: Robert Hou
> Assignee: Pritesh Maker
> Priority: Major
>
> This is the query:
> select * from (
> select Index, concat(BinaryValue, 'aaa') NewVarcharValue from (select * from
> dfs.`/drill/testdata/batch_memory/alltypes_large_1MB.parquet`)) d where
> d.Index = 1;
> This is the plan:
> {noformat}
> | 00-00 Screen
> 00-01 Project(Index=[$0], NewVarcharValue=[$1])
> 00-02 SelectionVectorRemover
> 00-03 Filter(condition=[=($0, 1)])
> 00-04 Project(Index=[$0], NewVarcharValue=[CONCAT($1, 'aaa')])
> 00-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/batch_memory/alltypes_large_1MB.parquet]],
> selectionRoot=maprfs:/drill/testdata/batch_memory/alltypes_large_1MB.parquet,
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`Index`,
> `BinaryValue`]]])
> {noformat}
> Here is the stack trace from drillbit.log:
> {noformat}
> 2018-06-27 13:55:03,291 [24cc0659-30b7-b290-7fae-ecb1c1f15c05:frag:0:0] ERROR
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException:
> Tried to remove unmanaged buffer.
> Fragment 0:0
> [Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IllegalStateException: Tried to remove unmanaged buffer.
> Fragment 0:0
> [Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
> ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_161]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_161]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Tried to remove unmanaged buffer.
> at
> org.apache.drill.exec.ops.BufferManagerImpl.replace(BufferManagerImpl.java:50)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at io.netty.buffer.DrillBuf.reallocIfNeeded(DrillBuf.java:97)
> ~[drill-memory-base-1.14.0-SNAPSHOT.jar:4.0.48.Final]
> at
> org.apache.drill.exec.test.generated.ProjectorGen4046.doEval(ProjectorTemplate.java:77)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.ProjectorGen4046.projectRecords(ProjectorTemplate.java:67)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:236)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.8.0_161]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_161]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> ~[hadoop-common-2.7.0-mapr-1707.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> ... 4 common frames omitted
> {noformat}
> The table can be found in
> /home/MAPRTECH/qa/rhou/drill4479/alltypes_large_1MB.parquet
> This is the commit id:
> aa127b70b1e46f7f4aa19881f25eda583627830a
> alltypes_large10.q
--
This message was sent by Atlassian Jira
(v8.3.4#803005)