Thanks Isha for the details. Application is running fine with the later version of Apex.
On Fri, Mar 4, 2016 at 11:19 AM, Isha Arkatkar <[email protected]> wrote: > Hi Chaitanya, > > The bug you mentioned is actually fixed in the latest version. The fix > for Jira APEXCORE-130 > <https://issues.apache.org/jira/browse/APEXCORE-130> handles > this issue as well. > Please try once with the latest changes from master. > > This is the commit id with fix: 139a9cac6397948bb63a53ea80188f2ffd6e5da2 > > Thanks! > Isha > > > On Thu, Mar 3, 2016 at 5:26 AM, Chaitanya Chebolu < > [email protected] > > wrote: > > > Thanks Isha for analyzing the issue. > > > > I am adding your analysis to the JIRA. > > > > I observed one more issue in THREAD_LOCAL. > > > > Let's the DAG be as follows: > > A -> B -> C > > > > Where A, B, C are operators, B and C are the operators which are them > > THREAD_LOCAL. > > > > > > If the downstream operator (i.e Operator C) throws exception from the > main > > thread, then application master caught exception and killed the > container. > > New container allocated for B and C operators. B is re-deployed into the > > newly allocated container and the status is ACTIVE, but, C is not > > re-deploying. > > > > After re-deployment of Operator B, DAG be as follows: > > A -> B. > > > > I looked into Stram Logs, observed the following message: > > "INFO com.datatorrent.stram.StreamingContainerManager: Affected operators > > [PTOperator[id=2,name=B]]". > > > > I think this is the issue. Here, Operator C is not there in affected > > operators. > > > > I created an application for this issue. Sample Application is here > > <https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>. > > > > @Isha: Have you observed the same behavior? > > > > I am creating a JIRA for this issue. > > > > Regards, > > Chaitanya > > > > On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh < > [email protected]> > > wrote: > > > > > Great finding Isha. > > > > > > In general, it is always advisable to do things in main thread. We had > > some > > > timing issues in dtIngest as we were emitting tuples in the Reconciler > > > thread. Once we moved all emit statements to the main thread, there > were > > no > > > issues observed. > > > > > > Issue: When tuples are emitted in Reconciler thread, some of them were > > > emitted post endWindow but before the checkpointing is done. These > tuples > > > for the downstream operator are not guaranteed to reach the same > window. > > > Thus checkpointing of the two operators is not in sync and that could > > > result in few tuples replayed wrongly from the Reconciler based > operator. > > > > > > Regards, > > > Sandeep > > > > > > On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > I checked the application > > https://github.com/chaithu14/AppThreadLocal > > > > > > > > In this example, exception from downstream operator is thrown in a > > > > different thread in AbstractReconciler operator. And the rethrow to > > main > > > > operator thread is done in handleIdleTime. This function is not > > > guaranteed > > > > to be invoked in every window. In Thread_local locality I checked > that > > > > handleIdleTime did not get invoked. So, the exception did not get > > > rethrown. > > > > > > > > The exception thrown from a different thread other than the main > > > operator > > > > thread are not caught by Application Master. Something we can > probably > > > add > > > > to troubleshooting guide to add a rethrow in the main thread. > > > > > > > > I verified that if downstream operator throws exception in the main > > > > thread, it is caught appropriately by application master even in > thread > > > > local case. > > > > > > > > Thanks, > > > > Isha > > > > > > > > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu < > > > > [email protected]> wrote: > > > > > > > > > Hi All, > > > > > > > > > > Created Sample application for THREAD_LOCAL issue. Application is > > > here > > > > > <https://github.com/chaithu14/AppThreadLocal>. > > > > > Application has the following DAG: > > > > > > > > > > RandomEventGenerator -> OuputOperator. > > > > > > > > > > Both the operators are THREAD_LOCAL. > > > > > > > > > > In OutputOperator, throwing exceptions at every committed window. > > So, > > > > > AppMaster supposed to kill container at every committed window. > This > > is > > > > > expected behavior. > > > > > But, this is not happening with the current Apex. > > > > > > > > > > One more observation is, If the upstream operator throws > exception > > at > > > > > every committed window, then AppMaster is killing the container > > > > > continuously. But, this is not happening with the downstream > > operator. > > > > > > > > > > Created JIRA for this issue: APEXCORE-357 > > > > > > > > > > Regards, > > > > > Chaitanya > > > > > > > > > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi , > > > > > > > > > > > > I am facing issues in Thread_Local. Two operators which are > > thread > > > > > local > > > > > > and out of which, the downstream operator throws exceptions. But, > > > > > AppMaster > > > > > > is not catching those exceptions. I was unable to figure out why > > > > > > application is not working. > > > > > > If both the operators are deployed on different containers, > then > > > the > > > > > > container is killed continuously by AppMaster. This is expected > > > > behavior. > > > > > > > > > > > > For Example, Let's say the dag be op1 -> op2 where op1, op2 > are > > > two > > > > > > operators which are of them thread local. Throws an exception > from > > > the > > > > > > downstream operator op2, AppMaster is not catching exceptions. I > > will > > > > > > create a JIRA for this issue. Please some one help on this. > > > > > > > > > > > > Regards, > > > > > > Chaitanya > > > > > > > > > > > > > > > > > > > > >
