paul-rogers commented on code in PR #13564:
URL: https://github.com/apache/druid/pull/13564#discussion_r1051287738


##########
sql/src/main/java/org/apache/druid/sql/SqlPlanningException.java:
##########
@@ -32,11 +32,12 @@
  */
 public class SqlPlanningException extends BadQueryException
 {
+
   public enum PlanningError
   {
-    SQL_PARSE_ERROR("SQL parse failed", SqlParseException.class.getName()),
-    VALIDATION_ERROR("Plan validation failed", 
ValidationException.class.getName()),
-    UNSUPPORTED_SQL_ERROR("SQL query is unsupported", 
RelOptPlanner.CannotPlanException.class.getName());
+    SQL_PARSE_ERROR(SQL_PARSE_FAILED_ERROR_CODE, 
SqlParseException.class.getName()),
+    VALIDATION_ERROR(PLAN_VALIDATION_FAILED_ERROR_CODE, 
ValidationException.class.getName()),
+    UNSUPPORTED_SQL_ERROR(SQL_QUERY_UNSUPPORTED_ERROR_CODE, 
RelOptPlanner.CannotPlanException.class.getName());

Review Comment:
   If we standardize on error codes, it would be a huge help to use (or at 
least include) the standard [SQLSTATE 
errors](https://en.wikipedia.org/wiki/SQLSTATE), where applicable. This is 
particularly helpful to JDBC users. At present, we don't return any meaningful 
errors except for the unauthorized error.



##########
core/src/main/java/org/apache/druid/query/QueryException.java:
##########
@@ -30,12 +29,129 @@
 import java.util.function.Function;
 
 /**
- * Base serializable error response
- *
+ * Base serializable error response.
+ * <p>
+ * The Object Model that QueryException follows is a little non-intuitive as 
the primary way that a QueryException is
+ * generated is through a child class.  However, those child classes are *not* 
equivalent to a QueryException, instead
+ * they act as a Factory of QueryException objects.  This can be seen in two 
different places.
+ * <p>
+ * 1. When sanitize() is called, the response is a QueryException without any 
indication of which original exception
+ * occurred.
+ * 2. When these objects get serialized across the wire the recipient 
deserializes a QueryException. The client is
+ * never expected, and fundamentally is not allowed to, ever deserialize a 
child class of QueryException.
+ * <p>
+ * For this reason, QueryException must contain all potential state that any 
of its child classes could ever want to
+ * push across the wire.  Additionally, any catch clauses expecting one of the 
child Exceptions must know that it is
+ * running inside of code where the exception has not traveled across the 
wire.  If there is a chance that the
+ * exception could have been serialized across the wire, the code must catch a 
QueryException and check the errorCode
+ * instead.
+ * <p>
+ * As a corollary, adding new state or adjusting the logic of this class must 
always be done in a backwards-compatible
+ * fashion across all child classes of QueryException.
+ * <p>
+ * If there is any need to do different logic based on the type of error that 
has happened, the only reliable method
+ * of discerning the type of the error is to look at the errorCode String.  
Because these Strings are considered part
+ * of the API, they are not allowed to change and must maintain their same 
semantics.  The known errorCode Strings
+ * are pulled together as public static fields on this class in order to make 
it more clear what options exist.
+ * <p>
  * QueryResource and SqlResource are expected to emit the JSON form of this 
object when errors happen.
  */
 public class QueryException extends RuntimeException implements 
SanitizableException
 {
+  /**
+   * Error codes
+   */
+  public static final String JSON_PARSE_ERROR_CODE = "Json parse failed";
+  public static final String BAD_QUERY_CONTEXT_ERROR_CODE = "Query context 
parse failed";
+  public static final String QUERY_CAPACITY_EXCEEDED_ERROR_CODE = "Query 
capacity exceeded";
+  public static final String QUERY_INTERRUPTED_ERROR_CODE = "Query 
interrupted";
+  // Note: the proper spelling is with a single "l", but the version with
+  // two "l"s is documented, we can't change the text of the message.
+  public static final String QUERY_CANCELED_ERROR_CODE = "Query cancelled";
+  public static final String UNAUTHORIZED_ERROR_CODE = "Unauthorized request";
+  public static final String UNSUPPORTED_OPERATION_ERROR_CODE = "Unsupported 
operation";
+  public static final String TRUNCATED_RESPONSE_CONTEXT_ERROR_CODE = 
"Truncated response context";
+  public static final String UNKNOWN_EXCEPTION_ERROR_CODE = "Unknown 
exception";
+  public static final String QUERY_TIMEOUT_ERROR_CODE = "Query timeout";
+  public static final String QUERY_UNSUPPORTED_ERROR_CODE = "Unsupported 
query";
+  public static final String RESOURCE_LIMIT_EXCEEDED_ERROR_CODE = "Resource 
limit exceeded";
+  public static final String SQL_PARSE_FAILED_ERROR_CODE = "SQL parse failed";
+  public static final String PLAN_VALIDATION_FAILED_ERROR_CODE = "Plan 
validation failed";
+  public static final String SQL_QUERY_UNSUPPORTED_ERROR_CODE = "SQL query is 
unsupported";
+
+  public enum FailType
+  {
+    USER_ERROR(400),
+    UNAUTHORIZED(401),
+    CAPACITY_EXCEEDED(429),
+    UNKNOWN(500),
+    SERVER_ERROR(500),
+    CANCELED(500),
+    QUERY_RUNTIME_FAILURE(501),
+    UNSUPPORTED(501),
+    TIMEOUT(504);
+
+    private final int expectedStatus;
+
+    FailType(int expectedStatus)
+    {
+      this.expectedStatus = expectedStatus;
+    }
+
+    public int getExpectedStatus()
+    {
+      return expectedStatus;
+    }
+  }
+
+  public static FailType fromErrorCode(String errorCode)

Review Comment:
   This is a good idea in general. I wonder if it is a scalable solution in 
particular. Another approach is suggested 
[here](https://github.com/apache/druid/issues/13123). we may find that a global 
list of errors becomes hard to manage. Since any error from storage on up could 
terminate a query, we'd want the entire system to use a single unified set of 
errors. That gets to be hard to manage.
   
   Also, it can be hard for a developer working in a lower level, to know the 
proper HTTP status to send. For which API? REST? JSON? gRPC?
   
   It might be better to make the system more modular. Define a set of error 
categories, then define rules for mapping categories at each API. See [this 
proposal](https://github.com/apache/druid/issues/13123) for a suggestion of the 
categories.
   
   Our problem is also one of mapping from the specific code issue, "symbol 
'foo' not found" to what this means to the user, "The table function foo(), at 
line 3, position 27, is not valid for the arguments a, b, c."



##########
core/src/main/java/org/apache/druid/query/QueryException.java:
##########
@@ -30,12 +29,126 @@
 import java.util.function.Function;
 
 /**
- * Base serializable error response
- *
+ * Base serializable error response.
+ * <p>
+ * The Object Model that QueryException follows is a little non-intuitive as 
the primary way that a QueryException is
+ * generated is through a child class.  However, those child classes are *not* 
equivalent to a QueryException, instead
+ * they act as a Factory of QueryException objects.  This can be seen in two 
different places.
+ * <p>
+ * 1. When sanitize() is called, the response is a QueryException without any 
indication of which original exception
+ * occurred.
+ * 2. When these objects get serialized across the wire the recipient 
deserializes a QueryException. The client is
+ * never expected, and fundamentally is not allowed to, ever deserialize a 
child class of QueryException.
+ * <p>
+ * For this reason, QueryException must contain all potential state that any 
of its child classes could ever want to
+ * push across the wire.  Additionally, any catch clauses expecting one of the 
child Exceptions must know that it is
+ * running inside of code where the exception has not traveled across the 
wire.  If there is a chance that the
+ * exception could have been serialized across the wire, the code must catch a 
QueryException and check the errorCode
+ * instead.
+ * <p>
+ * As a correlary, adding new state or adjusting the logic of this class must 
always be done in a backwards-compatible
+ * fashion across all child classes of QueryException.
+ * <p>
+ * If there is any need to do different logic based on the type of error that 
has happened, the only reliable method
+ * of discerning the type of the error is to look at the errorCode String.  
Because these Strings are considered part
+ * of the API, they are not allowed to change and must maintain their same 
semantics.  The known errorCode Strings
+ * are pulled together as public static fields on this class in order to make 
it more clear what options exist.
+ * <p>
  * QueryResource and SqlResource are expected to emit the JSON form of this 
object when errors happen.
  */
 public class QueryException extends RuntimeException implements 
SanitizableException
 {
+  /**
+   * Error codes
+   */
+  public static final String JSON_PARSE_ERROR_CODE = "Json parse failed";
+  public static final String BAD_QUERY_CONTEXT_ERROR_CODE = "Query context 
parse failed";
+  public static final String QUERY_CAPACITY_EXCEEDED_ERROR_CODE = "Query 
capacity exceeded";
+  public static final String QUERY_INTERRUPTED_ERROR_CODE = "Query 
interrupted";
+  // Note: the proper spelling is with a single "l", but the version with

Review Comment:
   I left this comment because Gian pointed out the correct "cancelled" (2 l's) 
spelling, but we noted we can't change the constant because it is part of the 
public API.
   
   But, this PR changes the constant (by moving it from one place to another), 
so we might as well fix the spelling of the constant name while we're breaking 
things.



##########
server/src/test/java/org/apache/druid/server/mocks/MockHttpServletRequest.java:
##########
@@ -0,0 +1,504 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.mocks;
+
+import javax.servlet.AsyncContext;
+import javax.servlet.DispatcherType;
+import javax.servlet.RequestDispatcher;
+import javax.servlet.ServletContext;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletRequest;
+import javax.servlet.ServletResponse;
+import javax.servlet.http.Cookie;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.HttpSession;
+import javax.servlet.http.HttpUpgradeHandler;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.security.Principal;
+import java.util.Collection;
+import java.util.Enumeration;
+import java.util.LinkedHashMap;
+import java.util.Locale;
+import java.util.Map;
+import java.util.function.Supplier;
+
+public class MockHttpServletRequest implements HttpServletRequest

Review Comment:
   There is already a similar implementation 
[here](https://github.com/apache/druid/blob/master/extensions-core/druid-catalog/src/test/java/org/apache/druid/server/http/catalog/DummyRequest.java).
 Perhaps we can combine them and move them to a common location.



##########
processing/src/main/java/org/apache/druid/query/context/ResponseContext.java:
##########
@@ -334,14 +335,15 @@ public Object mergeValues(Object oldValue, Object 
newValue)
      */
     public static final Key UNCOVERED_INTERVALS_OVERFLOWED = new BooleanKey(
         "uncoveredIntervalsOverflowed",
-        true);
+        true
+    );
 
     /**
      * Map of most relevant query ID to remaining number of responses from 
query nodes.
      * The value is initialized in {@code CachingClusteredClient} when it 
initializes the connection to the query nodes,
      * and is updated whenever they respond (@code DirectDruidClient). {@code 
RetryQueryRunner} uses this value to
      * check if the {@link #MISSING_SEGMENTS} is valid.
-     *
+     * <p>

Review Comment:
   Thank you for adding good formatting. I often use the IDE Javadoc preview. 
Without proper formatting, the preview is a bit of a muddle.



##########
core/src/test/java/org/apache/druid/query/QueryExceptionTest.java:
##########
@@ -38,36 +33,72 @@
   private static final String ERROR_MESSAGE_ORIGINAL = "aaaa";
   private static final String ERROR_MESSAGE_TRANSFORMED = "bbbb";
 
-  @Mock
-  private Function<String, String> trasformFunction;
-
   @Test
   public void testSanitizeWithTransformFunctionReturningNull()
   {
-    
Mockito.when(trasformFunction.apply(ArgumentMatchers.eq(ERROR_MESSAGE_ORIGINAL))).thenReturn(null);
     QueryException queryException = new QueryException(ERROR_CODE, 
ERROR_MESSAGE_ORIGINAL, ERROR_CLASS, HOST);
-    QueryException actual = queryException.sanitize(trasformFunction);
+
+    AtomicLong callCount = new AtomicLong(0);
+    QueryException actual = queryException.sanitize(s -> {
+      callCount.incrementAndGet();
+      Assert.assertEquals(ERROR_MESSAGE_ORIGINAL, s);
+      return null;
+    });
+
     Assert.assertNotNull(actual);
     Assert.assertEquals(actual.getErrorCode(), ERROR_CODE);
     Assert.assertNull(actual.getMessage());
     Assert.assertNull(actual.getHost());
     Assert.assertNull(actual.getErrorClass());
-    
Mockito.verify(trasformFunction).apply(ArgumentMatchers.eq(ERROR_MESSAGE_ORIGINAL));
-    Mockito.verifyNoMoreInteractions(trasformFunction);
+    Assert.assertEquals(1, callCount.get());
   }
 
   @Test
   public void testSanitizeWithTransformFunctionReturningNewString()
   {
-    
Mockito.when(trasformFunction.apply(ArgumentMatchers.eq(ERROR_MESSAGE_ORIGINAL))).thenReturn(ERROR_MESSAGE_TRANSFORMED);
     QueryException queryException = new QueryException(ERROR_CODE, 
ERROR_MESSAGE_ORIGINAL, ERROR_CLASS, HOST);
-    QueryException actual = queryException.sanitize(trasformFunction);
+
+    AtomicLong callCount = new AtomicLong(0);
+    QueryException actual = queryException.sanitize(s -> {

Review Comment:
   If we're defining a new error system, it would be ideal if sanitizing 
consists of stripping out the stack trace: the rest of the error was already 
well designed.
   
   Else, it becomes quite hard to sanitize a message in general. The only 
general solution is "We're sorry, it didn't work out this time." GIGO and all 
that. Let's not put information into an error that we have to strip back out.
   
   Or, if we must (to aid debugging), do so using properties that are easy to 
discard. Again, [see this 
proposal](https://github.com/apache/druid/issues/13123).



##########
core/src/main/java/org/apache/druid/query/QueryException.java:
##########
@@ -30,12 +29,129 @@
 import java.util.function.Function;
 
 /**
- * Base serializable error response
- *
+ * Base serializable error response.
+ * <p>
+ * The Object Model that QueryException follows is a little non-intuitive as 
the primary way that a QueryException is
+ * generated is through a child class.  However, those child classes are *not* 
equivalent to a QueryException, instead
+ * they act as a Factory of QueryException objects.  This can be seen in two 
different places.
+ * <p>
+ * 1. When sanitize() is called, the response is a QueryException without any 
indication of which original exception
+ * occurred.
+ * 2. When these objects get serialized across the wire the recipient 
deserializes a QueryException. The client is
+ * never expected, and fundamentally is not allowed to, ever deserialize a 
child class of QueryException.
+ * <p>
+ * For this reason, QueryException must contain all potential state that any 
of its child classes could ever want to
+ * push across the wire.  Additionally, any catch clauses expecting one of the 
child Exceptions must know that it is
+ * running inside of code where the exception has not traveled across the 
wire.  If there is a chance that the
+ * exception could have been serialized across the wire, the code must catch a 
QueryException and check the errorCode
+ * instead.
+ * <p>
+ * As a corollary, adding new state or adjusting the logic of this class must 
always be done in a backwards-compatible
+ * fashion across all child classes of QueryException.
+ * <p>
+ * If there is any need to do different logic based on the type of error that 
has happened, the only reliable method
+ * of discerning the type of the error is to look at the errorCode String.  
Because these Strings are considered part
+ * of the API, they are not allowed to change and must maintain their same 
semantics.  The known errorCode Strings
+ * are pulled together as public static fields on this class in order to make 
it more clear what options exist.
+ * <p>
  * QueryResource and SqlResource are expected to emit the JSON form of this 
object when errors happen.
  */
 public class QueryException extends RuntimeException implements 
SanitizableException
 {
+  /**
+   * Error codes
+   */
+  public static final String JSON_PARSE_ERROR_CODE = "Json parse failed";
+  public static final String BAD_QUERY_CONTEXT_ERROR_CODE = "Query context 
parse failed";
+  public static final String QUERY_CAPACITY_EXCEEDED_ERROR_CODE = "Query 
capacity exceeded";
+  public static final String QUERY_INTERRUPTED_ERROR_CODE = "Query 
interrupted";
+  // Note: the proper spelling is with a single "l", but the version with
+  // two "l"s is documented, we can't change the text of the message.
+  public static final String QUERY_CANCELED_ERROR_CODE = "Query cancelled";
+  public static final String UNAUTHORIZED_ERROR_CODE = "Unauthorized request";
+  public static final String UNSUPPORTED_OPERATION_ERROR_CODE = "Unsupported 
operation";
+  public static final String TRUNCATED_RESPONSE_CONTEXT_ERROR_CODE = 
"Truncated response context";
+  public static final String UNKNOWN_EXCEPTION_ERROR_CODE = "Unknown 
exception";
+  public static final String QUERY_TIMEOUT_ERROR_CODE = "Query timeout";
+  public static final String QUERY_UNSUPPORTED_ERROR_CODE = "Unsupported 
query";
+  public static final String RESOURCE_LIMIT_EXCEEDED_ERROR_CODE = "Resource 
limit exceeded";
+  public static final String SQL_PARSE_FAILED_ERROR_CODE = "SQL parse failed";
+  public static final String PLAN_VALIDATION_FAILED_ERROR_CODE = "Plan 
validation failed";

Review Comment:
   These are all quite lame messages. If I was a user, and was told "Plan 
validation failed", I'd have no idea where to start to set things right. Is 
this a user error? If so, what did I do wrong? Or, did the system run out of 
resources? Is there a bug in Druid: the plan should have worked, but didn't.
   
   We really gotta put some effort into improving our messages on top of just 
making the lousy messages into nice constants.



##########
sql/src/main/java/org/apache/druid/sql/http/SqlResource.java:
##########
@@ -117,171 +108,41 @@
   @POST
   @Produces(MediaType.APPLICATION_JSON)
   @Consumes(MediaType.APPLICATION_JSON)
+  @Nullable
   public Response doPost(
       final SqlQuery sqlQuery,
       @Context final HttpServletRequest req
-  ) throws IOException
+  )

Review Comment:
   Thank you for pulling the response out of the body of this method. That's 
been something that's been bugging me for a while.
   
   This refactoring will be helpful to the gRPC API project: the HTTP request 
handler is now just a few lines of code which is easy to replicate for the gRPC 
handler.



##########
sql/src/main/java/org/apache/druid/sql/http/SqlResource.java:
##########
@@ -320,4 +181,187 @@ public Response cancelQuery(
       return Response.status(Status.FORBIDDEN).build();
     }
   }
+
+  /**
+   * The SqlResource only generates metrics and doesn't keep track of 
aggregate counts of successful/failed/interrupted
+   * queries, so this implementation is effectively just a noop.
+   */
+  private static class SqlResourceQueryMetricCounter implements 
QueryResource.QueryMetricCounter
+  {
+    @Override
+    public void incrementSuccess()
+    {
+
+    }
+
+    @Override
+    public void incrementFailed()
+    {
+
+    }
+
+    @Override
+    public void incrementInterrupted()
+    {
+
+    }
+
+    @Override
+    public void incrementTimedOut()
+    {
+
+    }
+  }
+
+  private class SqlResourceQueryResultPusher extends QueryResultPusher
+  {
+    private final String sqlQueryId;
+    private final HttpStatement stmt;
+    private final SqlQuery sqlQuery;
+
+    public SqlResourceQueryResultPusher(
+        AsyncContext asyncContext,
+        String sqlQueryId,
+        HttpStatement stmt,
+        SqlQuery sqlQuery
+    )
+    {
+      super(
+          (HttpServletResponse) asyncContext.getResponse(),
+          SqlResource.this.jsonMapper,
+          SqlResource.this.responseContextConfig,
+          SqlResource.this.selfNode,
+          SqlResource.QUERY_METRIC_COUNTER,
+          sqlQueryId,
+          MediaType.APPLICATION_JSON_TYPE
+      );
+      this.sqlQueryId = sqlQueryId;
+      this.stmt = stmt;
+      this.sqlQuery = sqlQuery;
+    }
+
+    @Override
+    public ResultsWriter start()
+    {
+      return new ResultsWriter()
+      {
+        private ResultSet thePlan;
+
+        @Override
+        @Nullable
+        @SuppressWarnings({"unchecked", "rawtypes"})
+        public QueryResponse<Object> start(HttpServletResponse response)
+        {
+          response.setHeader(SQL_QUERY_ID_RESPONSE_HEADER, sqlQueryId);
+
+          final QueryResponse<Object[]> retVal;
+          try {
+            thePlan = stmt.plan();
+            retVal = thePlan.run();

Review Comment:
   This is OK... But I worry that we create an opportunity for a naive user to 
try to use the statement both here and in the request. Nothing in the SQL 
planner is multi-threaded: each operation must be called by only a single 
thread at a time. (It is obviously OK for the thread ownership to change, as 
here.)
   
   Also, in JDBC, there is an elaborate mechanism because the thread that 
starts the query (metrics?) must be the one that finishes them. I don't know if 
that restriction is still valid. If it is, this solution should be better than 
the original code.
   
   I wonder if it is better to do the planing in the request thread, and just 
return an error response if things go south. Only if the plan is good do we 
move onto the execution being done here.



##########
sql/src/main/java/org/apache/druid/sql/http/SqlResource.java:
##########
@@ -275,7 +275,10 @@ public QueryResponse<Object> start(HttpServletResponse 
response)
           // AssertionErrors are coming from and do something to ensure that 
they don't actually make it out of Calcite
           catch (AssertionError e) {
             log.warn(e, "AssertionError killed query: %s", sqlQuery);
-            final QueryInterruptedException wrappedEx = 
QueryInterruptedException.wrapIfNeeded(e);
+
+            // We wrap the exception here so that we get the sanitization.  
java.lang.AssertionError apparently
+            // doesn't implement 
org.apache.druid.common.exception.SanitizableException.

Review Comment:
   We really should clean up our error system rather than adding more hacks...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to