clintropolis opened a new pull request, #13361:
URL: https://github.com/apache/druid/pull/13361

   ### Description
   This PR fixes an accidental kill switch introduced by #13332, where by 
exposing the ability to construct sketches via expressions we also expose the 
ability to crash any jvm process running them by feeding them bad input.
   
   For example, the query `SELECT COMPLEX_DECODE_BASE64('HLLSketch', 'AgEH')` 
which is a truncated base64 encoded sketch object, will immediately crash the 
broker which evaluates it when planning the SQL query with an error of the form:
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x000000010424616c, pid=77140, tid=0x0000000000010303
   #
   # JRE version: OpenJDK Runtime Environment (8.0_332-b08) (build 
1.8.0_332-b08)
   # Java VM: OpenJDK 64-Bit Server VM (25.332-b08 mixed mode bsd-aarch64 
compressed oops)
   # Problematic frame:
   # j  
org.apache.datasketches.hll.PreambleUtil.insertInt(Lorg/apache/datasketches/memory/WritableMemory;JI)V+3
   #
   ```
   
   After the changes in this PR, we just get a regular out of bounds exception:
   
   ```
   2022-11-12T12:48:10,545 WARN [sql[a54c6476-ed24-4e00-a87b-f01c90cdae2c]] 
org.apache.druid.sql.http.SqlResource - Failed to handle query: 
SqlQuery{query='SELECT COMPLEX_DECODE_BASE64('HLLSketch', 'AgEH')', 
resultFormat=array, header=true, typesHeader=true, sqlTypesHeader=true, 
context={sqlOuterLimit=1001, sqlQueryId=a54c6476-ed24-4e00-a87b-f01c90cdae2c, 
queryId=a54c6476-ed24-4e00-a87b-f01c90cdae2c}, parameters=[]}
   java.lang.IndexOutOfBoundsException: null
        at java.nio.Buffer.checkIndex(Buffer.java:545) ~[?:1.8.0_332]
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:142) ~[?:1.8.0_332]
        at 
org.apache.druid.segment.data.SafeMemoryBase.getByte(SafeMemoryBase.java:65) 
~[classes/:?]
        at 
org.apache.datasketches.hll.PreambleUtil.extractLgK(PreambleUtil.java:280) 
~[datasketches-java-3.2.0.jar:?]
        at org.apache.datasketches.hll.HllSketch.wrap(HllSketch.java:247) 
~[datasketches-java-3.2.0.jar:?]
        at 
org.apache.druid.query.aggregation.datasketches.hll.HllSketchObjectStrategy.fromByteBufferSafe(HllSketchObjectStrategy.java:64)
 ~[?:?]
        at 
org.apache.druid.query.aggregation.datasketches.hll.HllSketchObjectStrategy.fromByteBufferSafe(HllSketchObjectStrategy.java:31)
 ~[?:?]
        at 
org.apache.druid.segment.column.ObjectStrategyComplexTypeStrategy.fromBytes(ObjectStrategyComplexTypeStrategy.java:93)
 ~[classes/:?]
        at 
org.apache.druid.math.expr.BuiltInExprMacros$ComplexDecodeBase64ExprMacro$ComplexDecodeBase64Expression.eval(BuiltInExprMacros.java:110)
 ~[classes/:?]
        at org.apache.druid.math.expr.Parser.lambda$flatten$1(Parser.java:181) 
~[classes/:?]
        at 
org.apache.druid.math.expr.BuiltInExprMacros$ComplexDecodeBase64ExprMacro$ComplexDecodeBase64Expression.visit(BuiltInExprMacros.java:117)
 ~[classes/:?]
        at org.apache.druid.math.expr.Parser.flatten(Parser.java:152) 
~[classes/:?]
        at org.apache.druid.math.expr.Parser.parse(Parser.java:144) 
~[classes/:?]
        at org.apache.druid.math.expr.Parser.parse(Parser.java:125) 
~[classes/:?]
        at 
org.apache.druid.sql.calcite.planner.DruidRexExecutor.reduce(DruidRexExecutor.java:74)
 ~[classes/:?]
        at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressionsInternal(ReduceExpressionsRule.java:695)
 ~[calcite-core-1.21.0.jar:1.21.0]
   ```
   
   In fairness, this problem existed prior to #13332, since the underlying 
problem has to do with anything using `Memory` on untrusted input bytes for 
anything that gets instructions of where to read or write from those bytes. The 
reason is that `Memory` for performance reasons uses `Unsafe` for all read and 
write operations, even when wrapping `ByteBuffer`, so if the code tries to read 
out of bounds locations it can crash the jvm process and do who knows what else.
   
   I think this behavior is absolutely correct for `Memory` for most uses, but 
to solve our problem in a bit of an unconventional way, this PR introduces 
`SafeWritableMemory` and `SafeWritableBuffer` that delegate _all_ operations to 
an underlying plain java `ByteBuffer` so that it gets the bounds checking that 
goes with that, which we can use for anything that uses `Memory` when loading 
from an untrusted source. When reading from segments we still use standard 
`Memory` calls.
   
   Fortunately due to a bug that wasn't fixed until #13332, i wasn't able to 
trigger this exact failure in older versions of Druid with the native 
`complex_decode_base64` expression because the complex type mapping wasn't 
correctly wired up to the function, but i believe it is still possible when 
ingesting pre-aggregated sketches that use `Memory`.
   
   To wire up to the `complex_decode_base64` function, I've added a new method 
to `ObjectStrategy`, 
   ```
     @Nullable
     default T fromByteBufferSafe(ByteBuffer buffer, int numBytes)
     {
       return fromByteBuffer(buffer, numBytes);
     }
   ```
   
   which should be implemented by anything which uses `Unsafe` or other 
potentially dangerous operations. I debated having a default implementation but 
ultimately decided to add one since this interface is marked as an 
`@ExtensionPoint`. This is erring on the side of danger in favor of 
compatibility, so I welcome discussion here.
   
   <hr>
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to