[GitHub] [jackrabbit-oak] nfsantos commented on a diff in pull request #639: OAK-9856 | Log query and caller stack trace in case of ES search fail…

GitBox Tue, 26 Jul 2022 02:53:52 -0700


nfsantos commented on code in PR #639:
URL: https://github.com/apache/jackrabbit-oak/pull/639#discussion_r929698732



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java:
##########
@@ -71,11 +71,11 @@ public class ElasticResultRowAsyncIterator implements 
Iterator<FulltextResultRow
     private final Predicate<String> rowInclusionPredicate;
     private final ElasticMetricHandler metricHandler;
     private final LMSEstimator estimator;
-
     private final ElasticQueryScanner elasticQueryScanner;
     private final ElasticRequestHandler elasticRequestHandler;
     private final ElasticResponseHandler elasticResponseHandler;
     private final ElasticFacetProvider elasticFacetProvider;
+    private final BlockingQueue<Throwable> errorQueue = new 
LinkedBlockingQueue<>();

Review Comment:
   Is it possible for `onFailure()` to be called more than once? My 
understanding is that it is called at most once, so maybe we could use a 
simpler data structure, like an `AtomicReference`.
   Or as an alternative, use the same queue as the one used for retrieving the 
results, doing something similar to the poison pill. This may simplify the 
logic as well because there will be a single queue to check. Or maybe even use 
the poison pill and change it to optionally include an exception, so it can be 
used for both the successful and unsuccessful case.



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java:
##########
@@ -111,6 +110,25 @@ public boolean hasNext() {
                 throw new IllegalStateException("Error reading next result 
from Elastic", e);
             }
         }
+
+        // Check if there are any exception traces filled from onFailure 
Callback in the errorQueue
+        // Any exception (such as ParseException) during the prefetch (init 
scanner) via the async call to ES would be available here
+        // when the cursor is actually being traversed.
+        // This is being done so that we can log the caller stack trace in 
case of any exception from ES and not just the trace of the async query thread.
+        if (!errorQueue.isEmpty()) {
+            Exception e = new Exception();
+            e.setStackTrace(Thread.currentThread().getStackTrace());
+            String exceptionMsg = null;
+            try {
+                exceptionMsg = errorQueue.take().getMessage();
+            } catch (InterruptedException interruptedException) {
+               LOG.debug("Error while trying to read ES exception from error 
queue");
+            }

Review Comment:
   Is the stack trace of the original exception logged somewhere? Here we are 
creating a new stack trace from the callers's thread, which is for sure useful 
to log. But we are only taking the message from the original exception, not the 
stack trace. The original stack trace is also not logged inside `onFailure()`. 
I think we should log both stack traces.



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/async/ElasticResultRowAsyncIterator.java:
##########
@@ -111,6 +110,25 @@ public boolean hasNext() {
                 throw new IllegalStateException("Error reading next result 
from Elastic", e);
             }
         }
+
+        // Check if there are any exception traces filled from onFailure 
Callback in the errorQueue
+        // Any exception (such as ParseException) during the prefetch (init 
scanner) via the async call to ES would be available here
+        // when the cursor is actually being traversed.
+        // This is being done so that we can log the caller stack trace in 
case of any exception from ES and not just the trace of the async query thread.
+        if (!errorQueue.isEmpty()) {
+            Exception e = new Exception();
+            e.setStackTrace(Thread.currentThread().getStackTrace());
+            String exceptionMsg = null;
+            try {
+                exceptionMsg = errorQueue.take().getMessage();

Review Comment:
   The call to `take()` has no timeout, so it can block for an unbounded time. 
This should not be a problem in this case, but as a general practice I think 
it's best to always have a timeout. Otherwise, there is a chance that a bug may 
cause the caller thread to block here forever, which can be very hard to debug 
as the system will seem to be hang without any useful information on the logs. 
So often, in these cases, the only way to figure out what is happening is to 
get a thread dump, which may not be easy to get. If all blocking operations 
have a timeout, then after a while the operation fails and we get an exception 
on the logs. 
   
   In this case, as we checked that the queue has something in it, we could 
`poll()` instead, which fails immediately if the queue is empty. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [jackrabbit-oak] nfsantos commented on a diff in pull request #639: OAK-9856 | Log query and caller stack trace in case of ES search fail…

Reply via email to