Sean Owen created SPARK-4545:
--------------------------------
Summary: If first Spark Streaming batch fails, it waits 10x batch
duration before stopping
Key: SPARK-4545
URL: https://issues.apache.org/jira/browse/SPARK-4545
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.1.0
Reporter: Sean Owen
Priority: Minor
(I'd like to track the issue raised at
http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/%3CCAMAsSdKY=QCT0YUdrkvbVuqXdFCGp1+6g-=s71fk8zr4uat...@mail.gmail.com%3E
as a JIRA since I think it's a legitimate issue that I can take a look into,
with some help.)
This bit of {{JobGenerator.stop()}} executes, since the message appears in the
logs:
{code}
def haveAllBatchesBeenProcessed = {
lastProcessedBatch != null && lastProcessedBatch.milliseconds == stopTime
}
logInfo("Waiting for jobs to be processed and checkpoints to be written")
while (!hasTimedOut && !haveAllBatchesBeenProcessed) {
Thread.sleep(pollTime)
}
// ... 10x batch duration wait here, before seeing the next line log:
logInfo("Waited for jobs to be processed and checkpoints to be written")
{code}
I think that {{lastProcessedBatch}} is always null since no batch ever
succeeds. Of course, for all this code knows, the next batch might
succeed and so is there waiting for it. But it should proceed after
one more batch completes, even if it failed?
{{JobGenerator.onBatchCompleted}} is only called for a successful batch.
Can it be called if it fails too? I think that would fix it.
Should the condition also not be {{lastProcessedBatch.milliseconds <=
stopTime}} instead of == ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]