Hi, yes, if < flume-NG. Here I have no reliable results, but it looks much better. That was my impression, so I was listen your monolog :) I've using E2E without flows and it worked nearly perfect. With flows I used DFO.
- Alex On Mon, Dec 12, 2011 at 12:31 PM, Ossi <los...@gmail.com> wrote: > Hi, > > I'll continue this monolog of mine. > > It seems that autoDFOChain is more reliable that autoE2EChain, could this be > correct? > For past days we have added 1 node/agent with 4 flows using autoDFOChain. > We also changed 1 node/agent to use autoDFOChain. > > Both of those has been working fine for 3-4 days. > > Last node/agent still using autoE2EChain has been having problems with 1 > flow. > > Today I tried to reconfigure that agent with autoE2EChain without success. > As soon as I switched it to us autoDFOChain, it started to work. > > Oddly, there is not any kind of errors visible neither in master nor > collector. > > This is how I did reconfiguration: > > exec unconfig ff-agent-http-error-fe-1 > exec unconfig ff-agent-http-error-fe-2 > exec unconfig ff-collector-http-error-fe > > exec unmap server3 ff-agent-http-error-fe-1 > exec unmap server3 ff-agent-http-error-fe-2 > exec unmap server6 ff-collector-http-error-fe > > exec decommission ff-agent-http-error-fe-1 > exec decommission ff-agent-http-error-fe-2 > exec decommission ff-collector-http-error-fe > > exec purge ff-agent-http-error-fe-1 > exec purge ff-agent-http-error-fe-2 > exec purge ff-collector-http-error-fe > > exec refreshAll > > exec map server3 ff-agent-http-error-fe-1 > exec map server3 ff-agent-http-error-fe-2 > exec map server6 ff-collector-http-error-fe > > exec config ff-agent-http-error-fe-1 ff-flow-http-error-fe > 'tailDir("/logs/ff/httpd-fe-1/", "ff_error_log-\\d{4}-\\d{2}-\\d{2}$", > true)' autoDFOChain > exec config ff-agent-http-error-fe-2 ff-flow-http-error-fe > 'tailDir("/logs/ff/httpd-fe-2/", "ff_error_log-\\d{4}-\\d{2}-\\d{2}$", > true)' autoDFOChain > exec config ff-collector-http-error-fe ff-flow-http-error-fe > autoCollectorSource > 'collectorSink("hdfs://namenode:8020/flume/ff/httpd-fe/%Y-%m-%d/", > "%{host}-error-")' > > waitForNodesActive 0 ff-agent-http-error-fe-1 ff-agent-http-error-fe-2 > ff-collector-http-error-fe > > exec refreshAll > > > Regards, > Ossi > > > > On Fri, Dec 2, 2011 at 3:22 PM, Ossi <los...@gmail.com> wrote: >> >> And one more thing: collector's Jetty is unresponsive again. >> It gives front page with content: >> Flume Administration >> >> Flume's Agent >> >> But doesn't redirect nor server flumeagent.jsp. >> >> br, >> Ossi >> >> >> On Fri, Dec 2, 2011 at 3:12 PM, Ossi <los...@gmail.com> wrote: >>> >>> hi! >>> >>> Unfortunately this happened again: >>> >>> collector stopped to write one flow to hdfs. Other flows seems to work >>> fine from the same host. >>> >>> Here is last entry of it ath hdfs: >>> -rw-r--r-- 3 flume supergroup 260 2011-12-01 16:27 >>> /flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225 >>> >>> And logs from collector: >>> 2011-12-01 17:27:15,523 INFO >>> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port >>> 35858... >>> .... >>> 2011-12-01 17:27:44,798 INFO >>> com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing >>> hdfs://hadoop:8020/flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225 >>> 2011-12-01 17:27:44,798 INFO >>> com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: >>> hdfs://hq-priv-01:8020/flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225.tmp >>> 2011-12-01 17:27:44,798 INFO >>> com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to >>> hdfs >>> >>> From agent there is related errors (at INFO level, why?): >>> 2011-12-01 17:27:19,407 INFO >>> com.cloudera.flume.handlers.debug.StubbornAppendSink: append failed on event >>> 'server2 [INFO Thu Dec 01 17:27:13 CET 2011] { AckChecksum : >>> (long)1085259347 (string) '^@^@^@^@@��S' (double)5.3618936E-315 } { >>> AckTag : 20111201-172709320+0100.2013898777141269.00000115 } { AckType : msg >>> } { tailSrcFile : ssl-aa_access_log-2011-12-01 } 1.2.3.4 >>> [01/Dec/2011:17:27:13 +0100] www.foo.bar \"GET / HTTP/1.1\" 400 226 Age:- >>> \"-\" \"-\"' with error: Append failed java.net.SocketException: Connection >>> reset >>> 2011-12-01 17:27:19,408 INFO >>> com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port >>> 35858 closed >>> 2011-12-01 17:27:19,409 INFO >>> com.cloudera.flume.handlers.debug.InsistentOpenDecorator: open attempt 0 >>> failed, backoff (1000ms): Failed to open thrift event sink to server6:35858 >>> : java.net.ConnectException: Connection refused >>> >>> So, for me it looks that flow is (it still does) trying to use thrift >>> server on port 35858 at collector server (server6). which was closed for >>> some reason. >>> >>> Any ideas why this has happened? >>> And for me this looks like a bug. Unless it is a known issue. >>> >>> br, >>> >>> Ossi >>> >>> >>> On Wed, Nov 30, 2011 at 9:25 AM, Ossi <los...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I'm new on the list and I do hope that some of you can help me. :) >>>> >>>> We are testing flume with fully distributed configuration and isolated >>>> flows. >>>> setup: >>>> 1 master server (server5) >>>> 1 collector (server6) >>>> 2 agents (server2 and server3) >>>> >>>> Both agent servers has 8 logical nodes collecting apache httpd logs. >>>> There is 2 apache instances running and we want to collect >>>> http and https both with access and errors separately. >>>> >>>> Suddenly Flume ceased to write some files to hdfs from the other server, >>>> but not all. >>>> First it ceased with aa_error_log... (wrote that only few moments) and >>>> later after running >>>> fine for several hours it ceased to write aa_access_logs. >>>> >>>> There isn't any error messages in master, collector or agent logs. And >>>> from agent point of view >>>> it seemed that it has been delivering those files all the time (not sure >>>> how to read those logs). >>>> Seems like collector just suddenly stopped delivering those files to >>>> hdfs. >>>> >>>> It seems that collector was somehow in bad shape, since it's Jetty >>>> didn't function too well either: >>>> it opened http://localhost:35862/, but got stalled while tried to get >>>> flumeagent.jsp file. >>>> >>>> After restart of collector (on next day) it continued to write files to >>>> hdfs, but it missed all the >>>> files from past 8 hours. Also web interface worked fine. >>>> >>>> Unfortunately we don't have any logs available, since we lost them due >>>> to bug https://issues.cloudera.org/browse/FLUME-631. >>>> >>>> So, does anybody have any idea what could have caused this or do we need >>>> to wait if it happens again? >>>> >>>> >>>> Log collection was configured like this (for both aa and bb) using >>>> "flume shell -c server5 -s flume-aa.txt": >>>> >>>> cat flume-aa.txt >>>> exec map server3 aa-agent-http-fe-1 >>>> exec map server3 aa-agent-http-fe-2 >>>> exec map server3 aa-agent-https-fe-1 >>>> exec map server3 aa-agent-https-fe-2 >>>> exec map server3 aa-agent-http-error-fe-1 >>>> exec map server3 aa-agent-http-error-fe-2 >>>> exec map server3 aa-agent-https-error-fe-1 >>>> exec map server3 aa-agent-https-error-fe-2 >>>> >>>> exec map server6 aa-collector-http-fe >>>> exec map server6 aa-collector-https-fe >>>> exec map server6 aa-collector-http-error-fe >>>> exec map server6 aa-collector-https-error-fe >>>> >>>> >>>> # HTTP >>>> exec config aa-agent-http-fe-1 aa-flow-http-fe >>>> 'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-agent-http-fe-2 aa-flow-http-fe >>>> 'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> >>>> exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource >>>> 'collectorSink("hdfs://hfds-server:8020/flume/aa/httpd-fe/%Y-%m-%d/", >>>> "%{host}-access-")' >>>> >>>> # HTTPS >>>> exec config aa-agent-https-fe-1 aa-flow-https-fe >>>> 'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-agent-https-fe-2 aa-flow-https-fe >>>> 'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> >>>> exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource >>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/", >>>> "%{host}-ssl-access-")' >>>> >>>> # HTTP ERROR >>>> exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe >>>> 'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe >>>> 'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-collector-http-error-fe aa-flow-http-error-fe >>>> autoCollectorSource >>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/", >>>> "%{host}-error-")' >>>> >>>> # HTTPS ERROR >>>> exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe >>>> 'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe >>>> 'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$", >>>> true)' autoE2EChain >>>> exec config aa-collector-https-error-fe aa-flow-https-error-fe >>>> autoCollectorSource >>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/", >>>> "%{host}-ssl-error-")' >>>> >>>> waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2 >>>> aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1 >>>> aa-agent-http-error-fe-2 aa-agent-https-error-fe-1 >>>> aa-agent-https-error-fe-2 >>>> aa-collector-http-fe aa-collector-https-fe aa-collector-http-error-fe >>>> aa-collector-https-error-fe >>>> >>>> exec refreshAll >>>> >>>> >>> >> > -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.