churromorales commented on PR #13156:
URL: https://github.com/apache/druid/pull/13156#issuecomment-1291145564

   @gianm I was testing the MM-less patch on the msq work you did.  I ran a 
test ingestion and the tasks just hang forever, after a bit of debugging here 
is what is happening, launch a controller with one worker: 
   
   I get this exception: 
   
   ```
   2022-10-25T20:32:25,413 ERROR [ServiceClientFactory-0] 
com.google.common.util.concurrent.ExecutionList - RuntimeException while 
executing runnable com.google.common.util.concurrent.Futures$4@7f14c4aa with 
executor 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService@138e191f
   java.lang.NullPointerException: host
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) 
~[guava-16.0.1.jar:?]
        at org.apache.druid.rpc.ServiceLocation.<init>(ServiceLocation.java:39) 
~[druid-server-24.0.0-6.jar:24.0.0-6]
        at 
org.apache.druid.rpc.indexing.SpecificTaskServiceLocator$1.onSuccess(SpecificTaskServiceLocator.java:137)
 ~[druid-server-24.0.0-6.jar:24.0.0-6]
        at 
org.apache.druid.rpc.indexing.SpecificTaskServiceLocator$1.onSuccess(SpecificTaskServiceLocator.java:113)
 ~[druid-server-24.0.0-6.jar:24.0.0-6]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1181) 
~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) 
~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) 
~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:872)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:102)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:868)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) 
~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) 
~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) 
~[guava-16.0.1.jar:?]
        at 
org.apache.druid.rpc.ServiceClientImpl$1.onSuccess(ServiceClientImpl.java:194) 
~[druid-server-24.0.0-6.jar:24.0.0-6]
        at 
org.apache.druid.rpc.ServiceClientImpl$1.onSuccess(ServiceClientImpl.java:168) 
~[druid-server-24.0.0-6.jar:24.0.0-6]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1181) 
~[guava-16.0.1.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 ~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
   ```
   
   For the middle manager less patch in k8s.  I launch a task, I added a setup 
and teardown functions, so before every AbstractTask runs, it will announce 
it's own location, then on teardown update status etc...
   
   From what I see here, is that we do have a taskStatus (the task gets 
launched) but the location has not yet been announced, in k8s we don't know the 
location until the pod comes up and the service is available to take a request. 
 So in the msq patch, it doesn't wait for the location, it assumes it knows it. 
 But we need this for the MM-less patch.  
   
   TLDR, its a race, we try to get the location for the controller before it 
announces it.  The TaskLocation is `unknown` until the task's runTask() method 
is invoked.  The precondition on a null host in the ServiceLocation constructor 
causes everything to die. 
   
   Do you have any advice how we can make these two co-exist?  This is the only 
blocker I see for this to work, everything else works as it did before.  I 
can't figure out a clean way, also I don't fully understand the msq patch, I 
thought you might have a solution for this. 
   
   Thank you
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to