Ma77Ball opened a new issue, #5325:
URL: https://github.com/apache/texera/issues/5325

   Each StoppableQueueBlockingRunnable thread (MainLoop, NetworkSender, 
PortStorageWriter) blocks in interruptible_get on a queue.get() with no 
timeout. The only way the thread leaves that wait is the RUNNABLE_STOP marker 
enqueued by stop().
   
   This is fragile on shutdown: if that single wakeup is ever missed (e.g. a 
lost notification) or a stop is signalled by a path that does not enqueue the 
marker, the thread parks forever. python_worker.run() then blocks at 
thread.join() and the worker never finishes shutting down.
   
   Note this only affects shutdown. A quiet queue during normal operation is 
fine and expected; the threads are long-lived and idle between tuple bursts.
   
   Proposed fix (behavior-preserving for the data path):
   - Add a threading.Event stop flag; stop() sets it in addition to enqueueing 
the marker.
   - Make interruptible_get poll with a short timeout and re-check the flag on 
each wakeup (treating queue.Empty as "loop and wait again"), so a missed marker 
can no longer park the thread indefinitely.
   - Thread an optional timeout parameter through Getable.get -> 
InternalQueue.get -> LinkedBlockingMultiQueue.get (the last via 
Condition.wait_for, raising queue.Empty on expiry). Default timeout=None keeps 
the existing blocking behavior unchanged, so only the stoppable threads opt 
into polling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to