paul-rogers commented on code in PR #13798:
URL: https://github.com/apache/druid/pull/13798#discussion_r1110438887


##########
processing/src/main/java/org/apache/druid/error/DruidException.java:
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.error;
+
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.logger.Logger;
+
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * Represents an error condition exposed to the user and/or operator of Druid.
+ * Not needed for purely internal exceptions thrown and caught within Druid 
itself.
+ * There are categories of error that determine the general form of corrective
+ * action, and also determine HTTP (or other API) status codes.
+ * <p>
+ * Druid exceptions can contain context. Use the context for details, such as
+ * file names, query context variables, symbols, etc. This allows the error
+ * message itself to be simple. Context allows consumers to filter out various
+ * bits of information that a site does not wish to expose to the user, while
+ * still logging the full details. Typical usage:
+ * <pre><code>
+ * if (something_is_wrong) {
+ *   throw DruidException.user("File not found")
+ *       .context("File name", theFile.getName())
+ *       .context("Directory", theFile.getParent())
+ *       .build();
+ * }
+ * </code></pre>
+ * <p>
+ * Exceptions are immutable. In many cases, an error is thrown low in the code,
+ * bit context is known at a higher level. In this case, the higher code should
+ * catch the exception, convert back to a builder, add context, and throw the
+ * new exception. The original call stack is maintained. Example:
+ * <pre><code>
+ * catch (DruidExceptin e) {
+ *   throw e.toBuilder().
+ *       .context("File name", theFile.getName())
+ *       .context("Directory", theFile.getParent())
+ *       .build();
+ * }
+ * </code></pre>
+ */
+public class DruidException extends RuntimeException
+{
+  public enum ErrorType
+  {
+    /**
+     * General case of an error due to something the user asked to do in an 
REST
+     * request. Translates to an HTTP status 400 (BAD_REQUET) for a REST call
+     * (or the equivalent for other APIs.)
+     */
+    USER,
+
+    /**
+     * Special case of a user error where a resource is not found and we wish
+     * to return a 404 (NOT_FOUND) HTTP status (or the equivalent for other
+     * APIs.)
+     */
+    NOT_FOUND,
+
+    /**
+     * Error due to a problem beyond the user's control, such as an assertion
+     * failed, unsupported operation, etc. These indicate problems with the 
software
+     * where the fix is either a workaround or a bug fix. Such error should 
only
+     * be raised for "should never occur" type situations.
+     */
+    SYSTEM,
+
+    /**
+     * Error for a resource limit: memory, CPU, slots or so on. The workaround 
is
+     * generally to try later, get more resources, reduce load or otherwise 
resolve
+     * the resource pressure issue.
+     */
+    RESOURCE,
+
+    /**
+     * Similar to RESOURCE, except indicates a timeout, perhaps due to load, 
due
+     * to an external system being unavailable, etc.
+     */
+    TIMEOUT,
+
+    /**
+     * Error in configuration. Indicates that the administrator made a mistake 
during
+     * configuration or setup. The solution is for the administrator (not the 
end user)
+     * to resolve the issue.
+     */
+    CONFIG,
+
+    /**
+     * Indicates a network error of some kind: intra-Druid, client-to-Druid,
+     * Druid-to-external system, etc. Generally the end user cannot fix these 
errors:
+     * it requires a DevOps person to resolve.
+     */
+    NETWORK
+  };

Review Comment:
   As it turns out, the topics above are _supposed_ to represent personas. But, 
we can perhaps sharpen up the concepts. The current thinking for the categories 
is:
   
   The `USER` category is something that (we believe) the user caused by virtue 
of something just requested. Most likely the user has to take action.
   
   A `CONFIG` error is targeted at the sys admin: they configured something 
wrong and they have to change a config (or align an external system with the 
config) to fix the error.
   
   A `SYSTEM` error is targeted at us: something is wrong in the code. The user 
will get the error, but all they can do is turn around, forward the message to 
us and ask us to fix whatever it is that went wrong.
   
   A `NETWORK` error _could_ be the user (they gave us a bogus HTTP input 
source URL), could be an admin (they didn't open a port required by Druid) or a 
generic error (someone with a backhoe just cut the fiber optic link.) If we can 
figure out which if these cases is occurring, we could better target the error. 
Do you know of a way to know this? Are our network errors clearly delimited 
split out in the code based on who should take action?
   
   For others it gets even muddier. A `RESOURCE` error can be seen as a user 
error ("hey! Don't query so many rows"), a load error ("there just isn't enough 
load to go around: go away and come back later."), a config issue ("1 GB is a 
bit low for this workload"), etc. Any suggestions for how we'd know, in the 
code, to whom to attribute the cause of the lack of resources?
   
   Given all this, the categories were chosen as a compromise between the 
_probable_ person who can fix things, and what we _actually_ know in the code.
   
   Perhaps we can rename `SYSTEM` to `DRUID_DEVELOPER`, `CONFIG` to 
`DRUID_ADMIN`, etc.
   
   The second point was about HTTP return statuses. That's a tricky one. The 
same errors are also used for JDBC and (soon) gRPC. The hope was that there is 
some level of error categorization, above individual errors, that maps to HTTP 
status codes on the one hand, useful SQLSTATE codes for JDBC, etc. Else, each 
error has to be decorated with an HTTP status, a SQLState, etc. Possible, but 
messy.
   
   Thoughts for how we might improve these areas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to