Thanks Isha for the details.
Application is running fine with the later version of Apex.

On Fri, Mar 4, 2016 at 11:19 AM, Isha Arkatkar <[email protected]> wrote:

> Hi Chaitanya,
>
>     The bug you mentioned is actually fixed in the latest version. The fix
> for Jira APEXCORE-130
> <https://issues.apache.org/jira/browse/APEXCORE-130> handles
> this issue as well.
>     Please try once with the latest changes from master.
>
> This is the commit id with fix: 139a9cac6397948bb63a53ea80188f2ffd6e5da2
>
> Thanks!
> Isha
>
>
> On Thu, Mar 3, 2016 at 5:26 AM, Chaitanya Chebolu <
> [email protected]
> > wrote:
>
> > Thanks Isha for analyzing the issue.
> >
> > I am adding your analysis to the JIRA.
> >
> > I observed one more issue in THREAD_LOCAL.
> >
> > Let's the DAG be as follows:
> >     A -> B -> C
> >
> > Where A, B, C are operators, B and C are the operators which are them
> > THREAD_LOCAL.
> >
> >
> > If the downstream operator (i.e Operator C) throws exception from the
> main
> > thread, then application master caught exception and killed the
> container.
> > New container allocated for B and C operators. B is re-deployed into the
> > newly allocated container and the status is ACTIVE, but, C is not
> > re-deploying.
> >
> > After re-deployment of Operator B, DAG be as follows:
> >      A -> B.
> >
> > I looked into Stram Logs, observed the following message:
> > "INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
> > [PTOperator[id=2,name=B]]".
> >
> > I think this is the issue. Here, Operator C is not there in affected
> > operators.
> >
> > I created an application for this issue. Sample Application is here
> > <https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>.
> >
> > @Isha: Have you observed the same behavior?
> >
> > I am creating a JIRA for this issue.
> >
> > Regards,
> > Chaitanya
> >
> > On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh <
> [email protected]>
> > wrote:
> >
> > > Great finding Isha.
> > >
> > > In general, it is always advisable to do things in main thread. We had
> > some
> > > timing issues in dtIngest  as we were emitting tuples in the Reconciler
> > > thread. Once we moved all emit statements to the main thread, there
> were
> > no
> > > issues observed.
> > >
> > > Issue: When tuples are emitted in Reconciler thread, some of them were
> > > emitted post endWindow but before the checkpointing is done. These
> tuples
> > > for the downstream operator are not guaranteed to reach the same
> window.
> > > Thus checkpointing of the two operators is not in sync and that could
> > > result in few tuples replayed wrongly from the Reconciler based
> operator.
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > >   I checked the application
> > https://github.com/chaithu14/AppThreadLocal
> > > >
> > > >   In this example, exception from downstream operator is thrown in a
> > > > different thread in AbstractReconciler operator. And the rethrow to
> > main
> > > > operator thread is done in handleIdleTime.  This function is not
> > > guaranteed
> > > > to be invoked in every window. In Thread_local locality I checked
> that
> > > > handleIdleTime did not get invoked. So, the exception did not get
> > > rethrown.
> > > >
> > > >   The exception thrown from a different thread other than the main
> > > operator
> > > > thread are not caught by Application Master. Something we can
> probably
> > > add
> > > > to troubleshooting guide to add a rethrow in the main thread.
> > > >
> > > >   I verified that if downstream operator throws exception in the main
> > > > thread, it is caught appropriately by application master even in
> thread
> > > > local case.
> > > >
> > > > Thanks,
> > > > Isha
> > > >
> > > > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > >   Created Sample application for THREAD_LOCAL issue. Application is
> > > here
> > > > > <https://github.com/chaithu14/AppThreadLocal>.
> > > > >   Application has the following DAG:
> > > > >
> > > > >                 RandomEventGenerator -> OuputOperator.
> > > > >
> > > > > Both the operators are THREAD_LOCAL.
> > > > >
> > > > >   In OutputOperator, throwing exceptions at every committed window.
> > So,
> > > > > AppMaster supposed to kill container at every committed window.
> This
> > is
> > > > > expected behavior.
> > > > >   But, this is not happening with the current Apex.
> > > > >
> > > > >   One more observation is, If the upstream operator throws
> exception
> > at
> > > > > every committed window, then AppMaster is killing the container
> > > > > continuously. But, this is not happening with the downstream
> > operator.
> > > > >
> > > > >  Created JIRA for this issue: APEXCORE-357
> > > > >
> > > > > Regards,
> > > > > Chaitanya
> > > > >
> > > > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi ,
> > > > > >
> > > > > >   I am facing issues in Thread_Local. Two operators which are
> > thread
> > > > > local
> > > > > > and out of which, the downstream operator throws exceptions. But,
> > > > > AppMaster
> > > > > > is not catching those exceptions. I was unable to figure out why
> > > > > > application is not working.
> > > > > >   If both the operators are deployed on different containers,
> then
> > > the
> > > > > > container is killed continuously by AppMaster. This is expected
> > > > behavior.
> > > > > >
> > > > > >    For Example, Let's say the dag be op1 -> op2 where op1, op2
> are
> > > two
> > > > > > operators which are of them thread local. Throws an exception
> from
> > > the
> > > > > > downstream operator op2, AppMaster is not catching exceptions. I
> > will
> > > > > > create a JIRA for this issue. Please some one help on this.
> > > > > >
> > > > > > Regards,
> > > > > > Chaitanya
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to