Whats the memory allocated for the Falcon Server ? It could be that falcon server going out of memory. Can you check if falcon server is throwing Full GCs ? If so can you try removing org.apache.falcon.metadata.MetadataMappingService from startup.properties and start the falcon server and try?
On Mon, Feb 22, 2016 at 1:09 PM, Margus Roo <[email protected]> wrote: > Found rows from log: > > 2016-02-22 08:54:48,273 INFO - [126455586@qtp-525968792-61 - > 763a5818-27e2-4ada-8d45-50b06afffa8e:margusja:GET//entities/list/feed,process] > ~ {Action:list, Dimensions:{}, Status: SUCCEEDED, Time-taken:579935193 ns} > (METRIC:38) > 2016-02-22 08:54:48,274 DEBUG - [126455586@qtp-525968792-61 - > 763a5818-27e2-4ada-8d45-50b06afffa8e:] ~ Audit: margusja/10.65.104.39 > performed request > http://hadoopnn2.estpak.ee:15000/api/entities/list/feed,process?fields=clusters,tags,status&offset=0&numResults=10 > (88.196.164.43) at time 2016-02-22T06:54Z (FalconAuditFilter:86) > 2016-02-22 08:55:10,388 INFO - [ActiveMQ ShutdownHook:] ~ ActiveMQ > Message Broker (localhost, ID:hadoopnn2.estpak.ee-48159-1455867360485-0:1) > is shutting down (BrokerService:560) > 2016-02-22 08:55:10,389 INFO - [ActiveMQ ShutdownHook:] ~ Connector > vm://localhost Stopped (TransportConnector:288) > 2016-02-22 08:55:10,652 INFO - [ActiveMQ Connection Executor: tcp:// > hadoopnn2.estpak.ee/88.196.164.43:61616:] ~ Error in onException for > topicSubscriber of topic: FALCON.ENTITY.TOPIC (JMSMessageConsumer:144) > javax.jms.JMSException: java.io.EOFException > at > org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49) > at > org.apache.activemq.ActiveMQConnection.onAsyncException(ActiveMQConnection.java:1833) > at > org.apache.activemq.ActiveMQConnection.onException(ActiveMQConnection.java:1850) > at > org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101) > at > org.apache.activemq.transport.ResponseCorrelator.onException(ResponseCorrelator.java:126) > at > org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101) > at > org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101) > at > org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:160) > at > org.apache.activemq.transport.InactivityMonitor.onException(InactivityMonitor.java:266) > at > org.apache.activemq.transport.TransportSupport.onException(TransportSupport.java:96) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:206) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) > at > org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227) > at > org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202) > ... 1 more > > > And before that there are loads of kerberos related problems: > 2016-02-22 08:54:48,272 WARN - [126455586@qtp-525968792-61 - > 763a5818-27e2-4ada-8d45-50b06afffa8e:margusja:GET//entities/list/feed,process] > ~ Exception while invoking class > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo > over hadoopnn2.estpak.ee/88.196.164.43:8020. Not retrying because > failovers (15) exceeded maximum allowed (15) (RetryInvocationHandler:121) > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)]; Host Details : local host is: " > hadoopnn2.estpak.ee/88.196.164.43"; destination host is: " > hadoopnn2.estpak.ee":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > > But thous kerberos problems will resolve after falcon restart. > > Anyway this is not the right list as I understand. Can you provide my user@ > subscription e-mail? > > Margus (margusja) Roo > http://margus.roo.ee > skype: margusja > +372 51 48 780 > > On 22/02/16 09:31, Pallavi Rao wrote: > >> It might not have to do with a particular process. It might go into >> UNKNOWN >> status when Falcon is unable to communicate with Oozie, for example. What >> will help in this case is the falcon.application.log (Falcon server logs). >> >> Regards, >> Pallavi >> >> On Mon, Feb 22, 2016 at 12:49 PM, Margus Roo <[email protected]> wrote: >> >> It is difficult because I have already more than ten processes are running >>> and I do not know exact moment when they are going in to UNKNOWN status. >>> I just hoped that it had happened before and someone in this list have >>> ideas. >>> So you think it is related with processes? >>> Then I can start only one process and then I see is it going to UNKNOWN. >>> >>> I tried to subscribe to user@ list but no success. In falcon site I can >>> not find user list subscribe e-mail. If you can provide it I can ask help >>> from user list. >>> >>> Margus (margusja) Roo >>> http://margus.roo.ee >>> skype: margusja >>> +372 51 48 780 >>> >>> On 22/02/16 09:14, Sandeep Samudrala wrote: >>> >>> Hi Margus, >>>> Please do send such queries over users mailing list. Can you attach your >>>> process definition and also can you check application.log. Please attach >>>> any stack trace if any. >>>> >>>> Thanks, >>>> -Sandeep >>>> On Feb 22, 2016 12:28 PM, "Margus Roo" <[email protected]> wrote: >>>> >>>> Hi >>>> >>>>> I am using Falcon- 0.6.1.2.3 packaged by Hortonworks HDP-2.3 >>>>> >>>>> I noticed that all my running processes go after some days in to >>>>> UNKNOWN >>>>> status. After restarting Falcon they are back in RUNNING status. And >>>>> after >>>>> some days it is repeating again. >>>>> >>>>> -- >>>>> Margus (margusja) Roo >>>>> http://margus.roo.ee >>>>> skype: margusja >>>>> +372 51 48 780 >>>>> >>>>> >>>>> >>>>> >
