[
https://issues.apache.org/jira/browse/CHUKWA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722493#action_12722493
]
Eric Yang commented on CHUKWA-323:
----------------------------------
According lsof, the open files descriptors are on files that stream over by
UTF8NewLineEscaped. I think this rules out FileAdaptor.
Example of the lsof output:
java 29960 user 1017r REG 8,2 88823 60953960
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0570_user_Chukwa-Demux_20090620_15_33
java 29960 user 1018r REG 8,2 77395 60954322
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0628_user_Chukwa-Demux_20090620_16_52
java 29960 user 1019r REG 8,2 69980 60952947
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0408_user_Chukwa-Demux_20090620_11_24
java 29960 user 1020r REG 8,2 88824 60964540
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_2590_user_Chukwa-Demux_20090622_04_03
java 29960 user 1021r REG 8,2 73955 60953910
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0562_user_HourlyChukwa-Rolling
java 29960 user 1022r REG 8,2 70987 60954047
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0584_user_Chukwa-Demux_20090620_15_54
java 29960 user 1023r REG 8,2 98856 60954860
/usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0714_user_Chukwa-Demux_20090620_18_31
> Chukwa agent unable to stream all data source on the jobtracker node
> --------------------------------------------------------------------
>
> Key: CHUKWA-323
> URL: https://issues.apache.org/jira/browse/CHUKWA-323
> Project: Hadoop Chukwa
> Issue Type: Bug
> Components: data collection
> Affects Versions: 0.2.0
> Environment: Redhat EL 5.1, Java 6
> Reporter: Eric Yang
> Priority: Blocker
> Fix For: 0.2.0
>
>
> HDFS namenode and mapreduce related metrics seem to stop sending data since
> 06/21/2009 00:00:00.
> Agent log contains exceptions like these:
> 2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor - failure reading
> /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
> java.io.FileNotFoundException:
> /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
> (Too many open files)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> at
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailingAdaptor.tailFile(FileTailingAdaptor.java:239)
> at
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailer.run(FileTailer.java:90)
> 2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor -
> Adaptor|58fb855b5c26d36cc1e69e264ce3402c| file:
> /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0352_user_PigLatin%3AHadoop_jvm_metrics.pig,
> has rotated and no detection - reset counters to 0L
> It looks like the number of file offset tracking pointers exceeded the jvm
> concurrent number of files open. Which
> triggers a feedback loop that FileTailingAdaptor assuming log file had
> rotated, but it wasn't the case.
> FileTailingAdaptor was simply unable to track the offset that's all.
> [r...@gsbl80211 log]# /usr/sbin/lsof -p 29960|wc -l
> 1084
> The concurrent # of open file is 1084 which exceeded the default limit 1024
> of concurrent open files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.