Sean Owen created SPARK-4545:
--------------------------------

             Summary: If first Spark Streaming batch fails, it waits 10x batch 
duration before stopping
                 Key: SPARK-4545
                 URL: https://issues.apache.org/jira/browse/SPARK-4545
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.1.0
            Reporter: Sean Owen
            Priority: Minor


(I'd like to track the issue raised at 
http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/%3CCAMAsSdKY=QCT0YUdrkvbVuqXdFCGp1+6g-=s71fk8zr4uat...@mail.gmail.com%3E
 as a JIRA since I think it's a legitimate issue that I can take a look into, 
with some help.)

This bit of {{JobGenerator.stop()}} executes, since the message appears in the 
logs:

{code}
def haveAllBatchesBeenProcessed = {
  lastProcessedBatch != null && lastProcessedBatch.milliseconds == stopTime
}
logInfo("Waiting for jobs to be processed and checkpoints to be written")
while (!hasTimedOut && !haveAllBatchesBeenProcessed) {
  Thread.sleep(pollTime)
}

// ... 10x batch duration wait here, before seeing the next line log:

logInfo("Waited for jobs to be processed and checkpoints to be written")
{code}

I think that {{lastProcessedBatch}} is always null since no batch ever
succeeds. Of course, for all this code knows, the next batch might
succeed and so is there waiting for it. But it should proceed after
one more batch completes, even if it failed?

{{JobGenerator.onBatchCompleted}} is only called for a successful batch.
Can it be called if it fails too? I think that would fix it.

Should the condition also not be {{lastProcessedBatch.milliseconds <=
stopTime}} instead of == ?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to