Chukwa agent unable to stream all data source on the jobtracker node
--------------------------------------------------------------------

                 Key: CHUKWA-323
                 URL: https://issues.apache.org/jira/browse/CHUKWA-323
             Project: Hadoop Chukwa
          Issue Type: Bug
          Components: data collection
    Affects Versions: 0.2.0
         Environment: Redhat EL 5.1, Java 6
            Reporter: Eric Yang
            Priority: Blocker
             Fix For: 0.2.0


HDFS namenode and mapreduce related metrics seem to stop sending data since 
06/21/2009 00:00:00. 
Agent log contains exceptions like these:

2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor - failure reading
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
java.io.FileNotFoundException:
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
(Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
        at
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailingAdaptor.tailFile(FileTailingAdaptor.java:239)
        at 
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailer.run(FileTailer.java:90)
2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor - 
Adaptor|58fb855b5c26d36cc1e69e264ce3402c| file:
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0352_user_PigLatin%3AHadoop_jvm_metrics.pig,
has rotated and no detection - reset counters to 0L

It looks like the number of file offset tracking pointers exceeded the jvm 
concurrent number of files open.  Which
triggers a feedback loop that FileTailingAdaptor assuming log file had rotated, 
but it wasn't the case. 
FileTailingAdaptor was simply unable to track the offset that's all.

[r...@gsbl80211 log]# /usr/sbin/lsof -p 29960|wc -l
1084

The concurrent # of open file is 1084 which exceeded the default limit 1024 of 
concurrent open files.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to