Collector closes thrift server without visible reason
-----------------------------------------------------

                 Key: FLUME-873
                 URL: https://issues.apache.org/jira/browse/FLUME-873
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v0.9.4
         Environment: RHEL 5
            Reporter: Ossi L
            Priority: Critical


We have 
2 agent nodes
1 collector 
1 master 
8 flows

First all of those works fine and thrift servers are started for each flow:
grep "Starting blocking thread pool server on port" 
flume-flume-node-server6.log.2011-11-29
2011-11-29 13:21:24,562 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35855...
2011-11-29 13:21:54,572 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35857...
2011-11-29 13:22:24,581 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35853...
2011-11-29 13:22:54,589 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35858...
2011-11-29 13:23:24,597 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35860...
2011-11-29 13:23:54,607 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35859...
2011-11-29 13:24:24,615 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35854...
2011-11-29 13:24:54,625 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35856...

At some point after start two of those has stopped without any visible reason:
flume-flume-node-server6.log.2011-12-01:2011-12-01 17:27:15,523 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 
35858...
flume-flume-node-server6.log.2011-12-05:2011-12-05 02:50:56,748 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 
35857...

And thus stopping data flow from two of the sources.

Log collection was configured like this (for both aa and bb) using "flume
shell -c server5 -s flume-aa.txt":

cat flume-aa.txt
exec map server3 aa-agent-http-fe-1
exec map server3 aa-agent-http-fe-2
exec map server3 aa-agent-https-fe-1
exec map server3 aa-agent-https-fe-2
exec map server3 aa-agent-http-error-fe-1
exec map server3 aa-agent-http-error-fe-2
exec map server3 aa-agent-https-error-fe-1
exec map server3 aa-agent-https-error-fe-2

exec map server6 aa-collector-http-fe
exec map server6 aa-collector-https-fe
exec map server6 aa-collector-http-error-fe
exec map server6 aa-collector-https-error-fe


# HTTP
exec config aa-agent-http-fe-1 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-http-fe-2 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain

exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource
'collectorSink("hdfs://hfds-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}-access-")'

# HTTPS
exec config aa-agent-https-fe-1 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-https-fe-2 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain

exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}-ssl-access-")'

# HTTP ERROR
exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-collector-http-error-fe aa-flow-http-error-fe
autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}-error-")'

# HTTPS ERROR
exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-collector-https-error-fe aa-flow-https-error-fe
autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}-ssl-error-")'

waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2
aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1
aa-agent-http-error-fe-2 aa-agent-https-error-fe-1
aa-agent-https-error-fe-2 aa-collector-http-fe aa-collector-https-fe
aa-collector-http-error-fe aa-collector-https-error-fe

exec refreshAll

A bit more info from email thread:
http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201111.mbox/%3ccaowicogsr_pz3fy4tnrgowwlwmjatjz5lkmltkrabc74ce5...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to