sohami commented on a change in pull request #1470: DRILL-6746: Query can hang
when PartitionSender task thread sees a co…
URL: https://github.com/apache/drill/pull/1470#discussion_r219639174
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java
##########
@@ -167,7 +169,25 @@ public RawFragmentBatch getNext() throws IOException {
// if we didn't get a batch, block on waiting for queue.
if (b == null && (!isTerminated() || !bufferQueue.isEmpty())) {
- b = bufferQueue.take();
+ // We shouldn't block infinitely here. There can be a condition such
that due to a failure FragmentExecutor
+ // state is changed to FAILED and queue is empty. Because of this the
minor fragment main thread will block
+ // here waiting for next batch to arrive. Meanwhile when next batch
arrived and was enqueued it sees
+ // FragmentExecutor failure state and doesn't enqueue the batch and
cleans up the buffer queue. Hence this
+ // thread will stuck forever. So we pool for 5 seconds until we get a
batch or FragmentExecutor state is in
+ // error condition.
+ while (b == null) {
+ b = bufferQueue.poll(5, TimeUnit.SECONDS);
+ if (!context.getExecutorState().shouldContinue()) {
+ kill(context);
+ if (b != null) {
+ assertAckSent(b);
Review comment:
This path is not executed during cleanup. Also the contract with poll is if
a batch is dequeued from buffer queue then it sends ack for that batch. So this
is to make sure that state is correct as done in non failure path too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services