Hi,

yes, if < flume-NG. Here I have no reliable results, but it looks much better.
That was my impression, so I was listen your monolog :)
I've using E2E without flows and it worked nearly perfect. With flows
I used DFO.

- Alex


On Mon, Dec 12, 2011 at 12:31 PM, Ossi <los...@gmail.com> wrote:
> Hi,
>
> I'll continue this monolog of mine.
>
> It seems that autoDFOChain is more reliable that autoE2EChain, could this be
> correct?
> For past days we have added 1 node/agent with 4 flows using autoDFOChain.
> We also changed 1 node/agent to use autoDFOChain.
>
> Both of those has been working fine for 3-4 days.
>
> Last node/agent still using autoE2EChain has been having problems with 1
> flow.
>
> Today I tried to reconfigure that agent with autoE2EChain without success.
> As soon as I switched it to us autoDFOChain, it started to work.
>
> Oddly, there is not any kind of errors visible neither in master nor
> collector.
>
> This is how I did reconfiguration:
>
> exec unconfig ff-agent-http-error-fe-1
> exec unconfig ff-agent-http-error-fe-2
> exec unconfig ff-collector-http-error-fe
>
> exec unmap server3 ff-agent-http-error-fe-1
> exec unmap server3 ff-agent-http-error-fe-2
> exec unmap server6 ff-collector-http-error-fe
>
> exec decommission ff-agent-http-error-fe-1
> exec decommission ff-agent-http-error-fe-2
> exec decommission ff-collector-http-error-fe
>
> exec purge ff-agent-http-error-fe-1
> exec purge ff-agent-http-error-fe-2
> exec purge ff-collector-http-error-fe
>
> exec refreshAll
>
> exec map server3 ff-agent-http-error-fe-1
> exec map server3 ff-agent-http-error-fe-2
> exec map server6 ff-collector-http-error-fe
>
> exec config ff-agent-http-error-fe-1 ff-flow-http-error-fe
> 'tailDir("/logs/ff/httpd-fe-1/", "ff_error_log-\\d{4}-\\d{2}-\\d{2}$",
> true)' autoDFOChain
> exec config ff-agent-http-error-fe-2 ff-flow-http-error-fe
> 'tailDir("/logs/ff/httpd-fe-2/", "ff_error_log-\\d{4}-\\d{2}-\\d{2}$",
> true)' autoDFOChain
> exec config ff-collector-http-error-fe ff-flow-http-error-fe
> autoCollectorSource
> 'collectorSink("hdfs://namenode:8020/flume/ff/httpd-fe/%Y-%m-%d/",
> "%{host}-error-")'
>
> waitForNodesActive 0 ff-agent-http-error-fe-1 ff-agent-http-error-fe-2
> ff-collector-http-error-fe
>
> exec refreshAll
>
>
> Regards,
> Ossi
>
>
>
> On Fri, Dec 2, 2011 at 3:22 PM, Ossi <los...@gmail.com> wrote:
>>
>> And one more thing: collector's Jetty is unresponsive again.
>> It gives front page with content:
>> Flume Administration
>>
>>     Flume's Agent
>>
>> But doesn't redirect nor server flumeagent.jsp.
>>
>> br,
>> Ossi
>>
>>
>> On Fri, Dec 2, 2011 at 3:12 PM, Ossi <los...@gmail.com> wrote:
>>>
>>> hi!
>>>
>>> Unfortunately this happened again:
>>>
>>> collector stopped to write one flow to hdfs. Other flows seems to work
>>> fine from the same host.
>>>
>>> Here is last entry of it ath hdfs:
>>> -rw-r--r--   3 flume supergroup        260 2011-12-01 16:27
>>> /flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225
>>>
>>> And logs from collector:
>>> 2011-12-01 17:27:15,523 INFO
>>> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
>>> 35858...
>>> ....
>>> 2011-12-01 17:27:44,798 INFO
>>> com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing
>>> hdfs://hadoop:8020/flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225
>>> 2011-12-01 17:27:44,798 INFO
>>> com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file:
>>> hdfs://hq-priv-01:8020/flume/aa/httpd-fe/2011-12-01/server2-ssl-access-20111201-172714555+0100.4251339811775625.00000225.tmp
>>> 2011-12-01 17:27:44,798 INFO
>>> com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to
>>> hdfs
>>>
>>> From agent there is related errors (at INFO level, why?):
>>> 2011-12-01 17:27:19,407 INFO
>>> com.cloudera.flume.handlers.debug.StubbornAppendSink: append failed on event
>>> 'server2 [INFO Thu Dec 01 17:27:13 CET 2011] { AckChecksum :
>>> (long)1085259347  (string) '^@^@^@^@@��S' (double)5.3618936E-315 } {
>>> AckTag : 20111201-172709320+0100.2013898777141269.00000115 } { AckType : msg
>>> } { tailSrcFile : ssl-aa_access_log-2011-12-01 } 1.2.3.4
>>> [01/Dec/2011:17:27:13 +0100] www.foo.bar \"GET / HTTP/1.1\" 400 226 Age:-
>>> \"-\" \"-\"' with error: Append failed java.net.SocketException: Connection
>>> reset
>>> 2011-12-01 17:27:19,408 INFO
>>> com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port
>>> 35858 closed
>>> 2011-12-01 17:27:19,409 INFO
>>> com.cloudera.flume.handlers.debug.InsistentOpenDecorator: open attempt 0
>>> failed, backoff (1000ms): Failed to open thrift event sink to server6:35858
>>> : java.net.ConnectException: Connection refused
>>>
>>> So, for me it looks that flow is (it still does) trying to use thrift
>>> server on port 35858 at collector server (server6). which was closed for
>>> some reason.
>>>
>>> Any ideas why this has happened?
>>> And for me this looks like a bug. Unless it is a known issue.
>>>
>>> br,
>>>
>>> Ossi
>>>
>>>
>>> On Wed, Nov 30, 2011 at 9:25 AM, Ossi <los...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm new on the list and I do hope that some of you can help me. :)
>>>>
>>>> We are testing flume with fully distributed configuration and isolated
>>>> flows.
>>>> setup:
>>>> 1 master server (server5)
>>>> 1 collector (server6)
>>>> 2 agents (server2 and server3)
>>>>
>>>> Both agent servers has 8 logical nodes collecting apache httpd logs.
>>>> There is 2 apache instances running and we want to collect
>>>> http and https both with access and errors separately.
>>>>
>>>> Suddenly Flume ceased to write some files to hdfs from the other server,
>>>> but not all.
>>>> First it ceased with aa_error_log... (wrote that only few moments) and
>>>> later after running
>>>> fine for several hours it ceased to write aa_access_logs.
>>>>
>>>> There isn't any error messages in master, collector or agent logs. And
>>>> from agent point of view
>>>> it seemed that it has been delivering those files all the time (not sure
>>>> how to read those logs).
>>>> Seems like collector just suddenly stopped delivering those files to
>>>> hdfs.
>>>>
>>>> It seems that collector was somehow in bad shape, since it's Jetty
>>>> didn't function too well either:
>>>> it opened http://localhost:35862/, but got stalled while tried to get
>>>> flumeagent.jsp file.
>>>>
>>>> After restart of collector (on next day) it continued to write files to
>>>> hdfs, but it missed all the
>>>> files from past 8 hours. Also web interface worked fine.
>>>>
>>>> Unfortunately we don't have any logs available, since we lost them due
>>>> to bug https://issues.cloudera.org/browse/FLUME-631.
>>>>
>>>> So, does anybody have any idea what could have caused this or do we need
>>>> to wait if it happens again?
>>>>
>>>>
>>>> Log collection was configured like this (for both aa and bb) using
>>>> "flume shell -c server5 -s flume-aa.txt":
>>>>
>>>> cat flume-aa.txt
>>>> exec map server3 aa-agent-http-fe-1
>>>> exec map server3 aa-agent-http-fe-2
>>>> exec map server3 aa-agent-https-fe-1
>>>> exec map server3 aa-agent-https-fe-2
>>>> exec map server3 aa-agent-http-error-fe-1
>>>> exec map server3 aa-agent-http-error-fe-2
>>>> exec map server3 aa-agent-https-error-fe-1
>>>> exec map server3 aa-agent-https-error-fe-2
>>>>
>>>> exec map server6 aa-collector-http-fe
>>>> exec map server6 aa-collector-https-fe
>>>> exec map server6 aa-collector-http-error-fe
>>>> exec map server6 aa-collector-https-error-fe
>>>>
>>>>
>>>> # HTTP
>>>> exec config aa-agent-http-fe-1 aa-flow-http-fe
>>>> 'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-agent-http-fe-2 aa-flow-http-fe
>>>> 'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>>
>>>> exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource
>>>> 'collectorSink("hdfs://hfds-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
>>>> "%{host}-access-")'
>>>>
>>>> # HTTPS
>>>> exec config aa-agent-https-fe-1 aa-flow-https-fe
>>>> 'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-agent-https-fe-2 aa-flow-https-fe
>>>> 'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>>
>>>> exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource
>>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
>>>> "%{host}-ssl-access-")'
>>>>
>>>> # HTTP ERROR
>>>> exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe
>>>> 'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe
>>>> 'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-collector-http-error-fe aa-flow-http-error-fe
>>>> autoCollectorSource
>>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
>>>> "%{host}-error-")'
>>>>
>>>> # HTTPS ERROR
>>>> exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe
>>>> 'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe
>>>> 'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
>>>> true)' autoE2EChain
>>>> exec config aa-collector-https-error-fe aa-flow-https-error-fe
>>>> autoCollectorSource
>>>> 'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
>>>> "%{host}-ssl-error-")'
>>>>
>>>> waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2
>>>> aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1
>>>> aa-agent-http-error-fe-2 aa-agent-https-error-fe-1 
>>>> aa-agent-https-error-fe-2
>>>> aa-collector-http-fe aa-collector-https-fe aa-collector-http-error-fe
>>>> aa-collector-https-error-fe
>>>>
>>>> exec refreshAll
>>>>
>>>>
>>>
>>
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Reply via email to