Are you by any change using only memory in the storage level of the input
streams?

TD


On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Bill,
>
> let's say the processing time is t' and the window size t. Spark does not
> *require* t' < t. In fact, for *temporary* peaks in your streaming data, I
> think the way Spark handles it is very nice, in particular since 1) it does
> not mix up the order in which items arrived in the stream, so items from a
> later window will always be processed later, and 2) because an increase in
> data will not be punished with high load and unresponsive systems, but with
> disk space consumption instead.
>
> However, if all of your windows require t' > t processing time (and it's
> not because you are waiting, but because you actually do some computation),
> then you are in bad luck, because if you start processing the next window
> while the previous one is still processed, you have less resources for each
> and processing will take even longer. However, if you are only waiting
> (e.g., for network I/O), then maybe you can employ some asynchronous
> solution where your tasks return immediately and deliver their result via a
> callback later?
>
> Tobias
>
>
>
> On Tue, Jul 1, 2014 at 2:26 AM, Bill Jay <bill.jaypeter...@gmail.com>
> wrote:
>
>> Tobias,
>>
>> Your suggestion is very helpful. I will definitely investigate it.
>>
>> Just curious. Suppose the batch size is t seconds. In practice, does
>> Spark always require the program to finish processing the data of t seconds
>> within t seconds' processing time? Can Spark begin to consume the new batch
>> before finishing processing the next batch? If Spark can do them together,
>> it may save the processing time and solve the problem of data piling up.
>>
>> Thanks!
>>
>> Bill
>>
>>
>>
>>
>> On Mon, Jun 30, 2014 at 4:49 AM, Tobias Pfeiffer <t...@preferred.jp>
>> wrote:
>>
>>> ​​If your batch size is one minute and it takes more than one minute to
>>> process, then I guess that's what causes your problem. The processing of
>>> the second batch will not start after the processing of the first is
>>> finished, which leads to more and more data being stored and waiting for
>>> processing; check the attached graph for a visualization of what I think
>>> may happen.
>>>
>>> Can you maybe do something hacky like throwing away a part of the data
>>> so that processing time gets below one minute, then check whether you still
>>> get that error?
>>>
>>> Tobias
>>>
>>>
>>> ​​
>>>
>>>
>>> On Mon, Jun 30, 2014 at 1:56 PM, Bill Jay <bill.jaypeter...@gmail.com>
>>> wrote:
>>>
>>>> Tobias,
>>>>
>>>> Thanks for your help. I think in my case, the batch size is 1 minute.
>>>> However, it takes my program more than 1 minute to process 1 minute's
>>>> data. I am not sure whether it is because the unprocessed data pile
>>>> up. Do you have an suggestion on how to check it and solve it? Thanks!
>>>>
>>>> Bill
>>>>
>>>>
>>>> On Sun, Jun 29, 2014 at 7:18 PM, Tobias Pfeiffer <t...@preferred.jp>
>>>> wrote:
>>>>
>>>>> Bill,
>>>>>
>>>>> were you able to process all information in time, or did maybe some
>>>>> unprocessed data pile up? I think when I saw this once, the reason
>>>>> seemed to be that I had received more data than would fit in memory,
>>>>> while waiting for processing, so old data was deleted. When it was
>>>>> time to process that data, it didn't exist any more. Is that a
>>>>> possible reason in your case?
>>>>>
>>>>> Tobias
>>>>>
>>>>> On Sat, Jun 28, 2014 at 5:59 AM, Bill Jay <bill.jaypeter...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am running a spark streaming job with 1 minute as the batch size.
>>>>> It ran
>>>>> > around 84 minutes and was killed because of the exception with the
>>>>> following
>>>>> > information:
>>>>> >
>>>>> > java.lang.Exception: Could not compute split, block
>>>>> input-0-1403893740400
>>>>> > not found
>>>>> >
>>>>> >
>>>>> > Before it was killed, it was able to correctly generate output for
>>>>> each
>>>>> > batch.
>>>>> >
>>>>> > Any help on this will be greatly appreciated.
>>>>> >
>>>>> > Bill
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to