Hi Will,
Yes, a) is what I would have anticipated over c). My rationale is that real
time stream processing is different from traditional IT (where fail fast is
key), in two keys ways (maybe there are more). 1) Because it's real time,
the system needs to keep running and be resilient to all kinds of
unexpected errors. And 2) We should assume that data produced by stream
sources is subject to variability and inconsistency. Both of these
attributes are unlike most of traditional IT (I know there are exceptions).
I would assert that streaming applications that quarks is targetted to
support will have a layer of systems downstream that are not constrained to
process in real time and are responsible for detecting problems, such as
gaps in the data from unexpected streaming problems. This leaves the real
time (quarks) portion of the app free to focus on data transformation,
aggregation and possibly some cleansing as well. Further, the fact that the
quarks processing may be running on embedded remote devices (at the edge),
means it's difficult to get access to them to see whats going on. Another
reason to make them behave resiliently.

Perhaps there is a middle ground between a) and c). I understand the desire
in a) to surface problems quickly, so here's an idea that might be useful.
This is not thought through completely so please forgive the impreciseness.
The idea is to introduce an "error" stream into the system where the quarks
runtime (and I suppose the app) could write into it at appropriate times.
In the case I ran into, suppose the batch api (which is supposed to return
a stream) also accepted an error stream as input. The runtime could write
into that stream if something goes wrong processing the window. The obvious
benefit is that the app gets notified very close to the time of error, and
that frees the runtime to be resilient behaving like c). The programming
model of callbacks to the app is consistent with the rest of quarks.

thanks


On Tue, Jul 26, 2016 at 4:17 PM, William Marshall <[email protected]>
wrote:

> Hi David,
>
> I think we're on the same page. By "then stop all batching for that
> window", I was taking the opposite, fail-fast perspective.
>
> At a high level, if an exception is thrown by the user's code, it means
> that something bad happened to the window which the user didn't catch, and
> handle. This could have been intentional, or the user could have failed to
> anticipate something. Either way, when an exception is thrown, there are
> three things which could be done:
>
> A) *Clear the window's state (i.e., its tuples) and then continue
> processing subsequent tuples.* I think this is what you mean by "Throw away
> what you were working on and move on to the next thing."
>
> B) *Do nothing, and continue processing subsequent tuples.* While this
> would solve the problem you encountered of having no tuples in the window,
> I don't think doing nothing is the right approach.
>
> C) *Stop using the window entirely. Incoming tuples are simply dropped, and
> aggregation is halted. *This is a conservative approach, and is what I
> meant by "stop all batching for that window".
>
> The issue I have with A) and B) is that errors might go unnoticed -- they
> don't force the user to fix the root cause of the problem or catch the
> exception because processing will continue regardless. Since C) would stop
> all window processing, a user's application won't function properly until
> the issue is fixed.
>
> To make sure I've understood you correctly, A) is the behavior that you
> would have expected?
>
> -Will
>
> On Fri, Jul 22, 2016 at 2:04 PM, David Booz <[email protected]> wrote:
>
> > Hi William,
> > Thanks for the reply. I think we're mostly on the same page.
> >
> > First, let me correct what I hope is a minor point. When I hit this
> > problem, my lambda function (which I would like to generically refer to
> as
> > a callback function) did not catch any exceptions at all. All exceptions
> > would flow back to the Edgent runtime. I only added catch clauses to
> prove
> > to myself that an exception was the reason for the stoppage. So your
> > example is not exactly what I was doing, but I think it's close enough
> for
> > the important part of the discussion.
> >
> > The lambda function I was using was from one of the samples, the moving
> > average function:
> >
> > (d,k) -> d.stream().reduce((a,b) -> a+b).get() / d.size()
> >
> > Periodically the .get() API would throw a NoSuchElementException, and I
> > think that is because the stream had no tuples in it. I suspect the right
> > thing for the application to do is check for the existence of any tuples
> > before blindly trying to run a reduce on them. However, that's beside the
> > point here. Exceptions are always possible so it's important in any
> > programming model to understand who is responsible for handling them and
> > which ones. Thus my question.
> >
> > My opinion is that Edgent should be catching all exceptions from app
> > callbacks and continue processing the next "thing". Throw away what you
> > were working on and move on to the next thing. I'm not sure what it means
> > to "then stop all batching for that window". When more tuples arrive,
> will
> > the batch run again on the new tuples...and possibly result in an
> exception
> > again? This is the behavior I would have expected.
> >
> > If the programming model NEVER expects the callbacks to throw exceptions
> > (which I hope is the case), then the Edgent runtime can eat them and keep
> > going. But if there are places in the programming model where these
> > callback functions are supposed to throw exceptions, then things will get
> > more tricky in the runtime.
> >
> > Assuming we agree on what "then stop all batching for that window" means,
> > is it a big deal to fix? Is anyone already working on it?
> >
> > thanks
> >
> > On Fri, Jul 22, 2016 at 3:42 PM, William Marshall <[email protected]>
> > wrote:
> >
> > > Hi David,
> > >
> > > Thank you for joining the mailing list!
> > >
> > > >if the lambda function that processes a window into a new stream
> > > encounters an exception but DOES NOT handle it, what is supposed to
> > happen?
> > > By not handling it, I assume you mean something like the following
> where
> > > the exception is rethrown:
> > >
> > > /* In the user's lambda */
> > > try{
> > >     // Do some operation
> > > }
> > > catch(IllegalStateException e){
> > >     throw e;
> > > }
> > >
> > > In this case, what *does* happen, currently, is the exception will
> > > percolate up to the Edgent/Quarks Thread Scheduler and be caught
> there. I
> > > believe this kills all runtime threads, terminating the application.
> This
> > > is why you observe all tuple flow to stop after you removed exception
> > > catching from your lambda code.
> > >
> > > What *should* happen is that that the windowing library catches the
> > > exception and then stop all batching for that window. This is more
> > graceful
> > > than terminating all threads. The windowing library might look
> something
> > > like the following:
> > >
> > > /* In PartitionImpl */
> > > @Override
> > > public synchronized void process() {
> > >     try{
> > >         window.getPartitionProcessor().accept(unmodifiableTuples, key);
> > >     }
> > >     catch(Exception e){
> > >        // Clear the ScheduledExecutorService which handles the batch
> > > scheduling. No more batching for this window.
> > >     }
> > > }
> > >
> > > >I rewrote my lambda function to catch exceptions, and once in a while
> > the
> > > catch clause gets control.
> > > Right, so if your lambda code catches all exceptions and doesn't
> rethrow
> > > them, the batch scheduler doesn't know that anything is wrong and will
> > > continue to schedule batches. This is why the catch clause gets control
> > > once in a while, and you continue to see tuples downstream from the
> > window.
> > >
> > > >It is very inconvenient (from a programming model perspective) for the
> > > lambda functions to have to do exception handling in simple cases
> > > Would you mind providing a brief code/pseudocode example of such a
> simple
> > > case?
> > >
> > > I hope this helps to answer your question.
> > >
> > > -Will
> > >
> >
> >
> >
> > --
> > Dave Booz
> > [email protected]
> >
>



-- 
Dave Booz
[email protected]

Reply via email to