Are you by any change using only memory in the storage level of the input streams?
TD On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Bill, > > let's say the processing time is t' and the window size t. Spark does not > *require* t' < t. In fact, for *temporary* peaks in your streaming data, I > think the way Spark handles it is very nice, in particular since 1) it does > not mix up the order in which items arrived in the stream, so items from a > later window will always be processed later, and 2) because an increase in > data will not be punished with high load and unresponsive systems, but with > disk space consumption instead. > > However, if all of your windows require t' > t processing time (and it's > not because you are waiting, but because you actually do some computation), > then you are in bad luck, because if you start processing the next window > while the previous one is still processed, you have less resources for each > and processing will take even longer. However, if you are only waiting > (e.g., for network I/O), then maybe you can employ some asynchronous > solution where your tasks return immediately and deliver their result via a > callback later? > > Tobias > > > > On Tue, Jul 1, 2014 at 2:26 AM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> Tobias, >> >> Your suggestion is very helpful. I will definitely investigate it. >> >> Just curious. Suppose the batch size is t seconds. In practice, does >> Spark always require the program to finish processing the data of t seconds >> within t seconds' processing time? Can Spark begin to consume the new batch >> before finishing processing the next batch? If Spark can do them together, >> it may save the processing time and solve the problem of data piling up. >> >> Thanks! >> >> Bill >> >> >> >> >> On Mon, Jun 30, 2014 at 4:49 AM, Tobias Pfeiffer <t...@preferred.jp> >> wrote: >> >>> ​​If your batch size is one minute and it takes more than one minute to >>> process, then I guess that's what causes your problem. The processing of >>> the second batch will not start after the processing of the first is >>> finished, which leads to more and more data being stored and waiting for >>> processing; check the attached graph for a visualization of what I think >>> may happen. >>> >>> Can you maybe do something hacky like throwing away a part of the data >>> so that processing time gets below one minute, then check whether you still >>> get that error? >>> >>> Tobias >>> >>> >>> ​​ >>> >>> >>> On Mon, Jun 30, 2014 at 1:56 PM, Bill Jay <bill.jaypeter...@gmail.com> >>> wrote: >>> >>>> Tobias, >>>> >>>> Thanks for your help. I think in my case, the batch size is 1 minute. >>>> However, it takes my program more than 1 minute to process 1 minute's >>>> data. I am not sure whether it is because the unprocessed data pile >>>> up. Do you have an suggestion on how to check it and solve it? Thanks! >>>> >>>> Bill >>>> >>>> >>>> On Sun, Jun 29, 2014 at 7:18 PM, Tobias Pfeiffer <t...@preferred.jp> >>>> wrote: >>>> >>>>> Bill, >>>>> >>>>> were you able to process all information in time, or did maybe some >>>>> unprocessed data pile up? I think when I saw this once, the reason >>>>> seemed to be that I had received more data than would fit in memory, >>>>> while waiting for processing, so old data was deleted. When it was >>>>> time to process that data, it didn't exist any more. Is that a >>>>> possible reason in your case? >>>>> >>>>> Tobias >>>>> >>>>> On Sat, Jun 28, 2014 at 5:59 AM, Bill Jay <bill.jaypeter...@gmail.com> >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > I am running a spark streaming job with 1 minute as the batch size. >>>>> It ran >>>>> > around 84 minutes and was killed because of the exception with the >>>>> following >>>>> > information: >>>>> > >>>>> > java.lang.Exception: Could not compute split, block >>>>> input-0-1403893740400 >>>>> > not found >>>>> > >>>>> > >>>>> > Before it was killed, it was able to correctly generate output for >>>>> each >>>>> > batch. >>>>> > >>>>> > Any help on this will be greatly appreciated. >>>>> > >>>>> > Bill >>>>> > >>>>> >>>> >>>> >>> >> >