Hi Eric, After using this patch , the chuwa in my environment has run normally for almost 10 days . Maybe this exception is the cause of chukwa agent crashing. And the " INFO Timer-1 HttpConnector - # http chunks ACK'ed since last report: 0 " didn't appear. And in my enviroment ,we do need sync the machine's time , this should be this cause of the exception.
On Sat, Jul 30, 2011 at 12:00 AM, Eric Yang <[email protected]> wrote: > Hi Ivy, > > In Http Sender, there is a throttle to sleep 5 seconds, if the data sending > frequency is high. If the lastPosted time is more recent than the current > time, then the value becomes negative. This could happen when ntp makes > time correction. When sleeping on negative millisecond, then it throws > exception, and prevents agent to send data to collector. I filed a JIRA > (CHUKWA-595) for this, thanks for catching this. :) > > regards, > Eric > > On Jul 28, 2011, at 11:59 PM, Ying Tang wrote: > > > Hi Eric, > > I meet with an exception. > > > > Exception in thread "HTTP post thread" > java.lang.IllegalArgumentException: timeout value is negative > > at java.lang.Thread.sleep(Native Method) > > at > org.apache.hadoop.chukwa.datacollection.connector.http.HttpConnector.run(HttpConnector.java:187) > > at java.lang.Thread.run(Thread.java:636) > > > > And the log " INFO Timer-1 HttpConnector - # http chunks ACK'ed since > last report: 0 " appears again. > > On Thu, Jul 28, 2011 at 3:23 AM, Eric Yang <[email protected]> wrote: > > We need two numbers to track rotated logs properly without sending > duplicates. A number is the current file offset, and another number is the > incremental sequence number. The checkpoint file only preserves the > incremental sequence number. The current file offset is not tracked. If > agent crashes, agent will stream from offset 0 of the current file. Hence, > there maybe duplicate entries, when system recovers from agent crash. I > tested this, after file rotated, it is sending the new data from the current > log file, but the file current offset is tracked in memory only. It is not > visible from port 9093. > > > > This issue is not directly related to the issue that you experienced > which the file tailing adaptor. From the code, it is still not clear why > agent crashed. However, I would recommend to use Chukwa trunk with > SocketAdaptor with Log4JSocketAppender. It is more reliable than UTF8 > adaptor. > > > > See: http://wiki.apache.org/hadoop/Chukwa_Quick_Start > > > > for installation instructions. > > > > regards, > > Eric > > > > On Jul 26, 2011, at 10:54 PM, Ying Tang wrote: > > > > > And after i restart the chukwa agent , the telnet adaptor last number > is still 10487067. > > > > > > On Wed, Jul 27, 2011 at 1:52 PM, Ying Tang <[email protected]> > wrote: > > > Does this mean the adaptor still transfer the previous log ? The > current log is missing? > > > > > > > > > On Wed, Jul 27, 2011 at 1:08 PM, Eric Yang <[email protected]> wrote: > > > This looks like a bug, the last number should be in sync with the > > > current file's size, but the UTF adaptor is still tailing the previous > > > file (which rotated at 10487067) > > > It means there is a bug in handling the file rotation, but the adaptor > > > did not pick up the change. > > > > > > Please open a jira. Thanks > > > > > > regards, > > > Eric > > > > > > On Tue, Jul 26, 2011 at 8:05 PM, Ying Tang <[email protected]> > wrote: > > > > The log didn't rotate very rapidly. > > > > > > > > Now i can't rebuild the scenario . But when the chukwa agent log > looks ok, > > > > > > > > 2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing > checkpoint > > > > 1307083 > > > > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - > collected 1 > > > > chunks for post_745 > > > > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - > >>>>>> HTTP > > > > post_745 to http://chukwacollector1.xingcloud.com:9095/ length = > 1837 > > > > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - > >>>>>> HTTP > > > > Got success back from > http://chukwacollector1.xingcloud.com:9095/chukwa; > > > > response length 43 > > > > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - > post_745 > > > > sent 0 chunks, got back 1 acks > > > > > > > > The list in telnet agent 9093 is: > > > > adaptor_2963225a90653a309cf779d4a1d815a3) > > > > > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 > > > > Gamelog 0 /var/log/gamelog 10487067 > > > > After several minites , the list is still > > > > adaptor_2963225a90653a309cf779d4a1d815a3) > > > > > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 > > > > Gamelog 0 /var/log/gamelog 10487067 > > > > > > > > Is the 10487067 the offset number ?The number didn't changed , and > the log > > > > file's size is from 0 to 10M .And now the log file's size is 1150872. > > > > > > > > On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <[email protected]> > wrote: > > > >> > > > >> CharFileTailingAdaptorUTF should handle log rotation gracefully. Is > the > > > >> log rotating rapidly? > > > >> Run those command on chukwa agent: > > > >> telnet localhost 9093 > > > >> list > > > >> This should show a list of tailing files, and check the offset > number of > > > >> the tailing log file. The most right number should be smaller than > the size > > > >> of your log file. If it is bigger and not changing, it is most > likely there > > > >> is a bug that we haven't seen before. It might be useful to turn on > debug > > > >> on chukwa agent and see if this can be reproduced to nail down the > root > > > >> cause. Thanks > > > >> regards, > > > >> Eric > > > >> On Jul 26, 2011, at 6:13 AM, Ying Tang wrote: > > > >> > > > >> Is there the possibility that > > > >> when the log file reaches the log4g config file size ,the log4j will > > > >> rename this log file and create a new file with this name as the log > file . > > > >> At the time ,the chukwa adaptor doesn't tail the log properly , and > this > > > >> cause the chuwa agent can't collector the log any more. > > > >> > > > >> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <[email protected]> > wrote: > > > >>> > > > >>> The log file is log4j log file ,and the size is 10M ,the > maxbackupindex > > > >>> is 1. > > > >>> > > > >>> > > > >>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <[email protected]> > wrote: > > > >>>> > > > >>>> Can you run "ls -l" to show the size and dateof the log files that > you > > > >>>> are streaming? > > > >>>> > > > >>>> regards, > > > >>>> Eric > > > >>>> > > > >>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang <[email protected] > > > > > >>>> wrote: > > > >>>> > The chukwa version is 0.4.0 and the adaptor is > > > >>>> > > > > >>>> > > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 > > > >>>> > > > > >>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <[email protected]> > wrote: > > > >>>> >> > > > >>>> >> Hi Ivy, > > > >>>> >> > > > >>>> >> When data is send from agent to collector, collector send > > > >>>> >> acknowledgment > > > >>>> >> of receiving of the chunks. At 00:03:28, there are 5 chunks > > > >>>> >> acknowledged. > > > >>>> >> This means communication between collector and agent are > working at > > > >>>> >> that > > > >>>> >> point in time. However, there is no activity after 00:04:28. > This > > > >>>> >> looks > > > >>>> >> like adaptor did not handle the log rotation properly at close > to > > > >>>> >> midnight. > > > >>>> >> Which version of Chukwa are you using and which adaptor are > you > > > >>>> >> using? > > > >>>> >> > > > >>>> >> regards, > > > >>>> >> Eric > > > >>>> >> > > > >>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote: > > > >>>> >> > > > >>>> >> > Hi all, > > > >>>> >> > > > > >>>> >> > In my cluster , i have two chukwa agent and one collector . > > > >>>> >> > At a time , both chukwa agents's log : > > > >>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector - # http > chunks > > > >>>> >> > ACK'ed since last report: 5 > > > >>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector - # http > chunks > > > >>>> >> > ACK'ed since last report: 0 > > > >>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector - # http > chunks > > > >>>> >> > ACK'ed since last report: 0 > > > >>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector - # http > chunks > > > >>>> >> > ACK'ed since last report: 0 > > > >>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector - # http > chunks > > > >>>> >> > ACK'ed since last report: 0 > > > >>>> >> > > > > >>>> >> > And the collector > > > >>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root - > > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 > > > >>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root - > > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 > > > >>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root - > > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 > > > >>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter - > > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 > > > >>>> >> > > > > >>>> >> > (the collector and agent has different timezone) > > > >>>> >> > And the collector didn't collect any log. > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > What dons the "http chunks ACK'ed since last report: 0" > means? > > > >>>> >> > And from this log "http chunks ACK'ed since last report: 0" > appears > > > >>>> >> > to > > > >>>> >> > agent crash, the chukwa port still on , but after several > days, > > > >>>> >> > both agents > > > >>>> >> > crashed without exceptions. > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > -- > > > >>>> >> > Best regards, > > > >>>> >> > > > > >>>> >> > Ivy Tang > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > -- > > > >>>> > Best regards, > > > >>>> > Ivy Tang > > > >>>> > > > > >>>> > > > > >>>> > > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> Best regards, > > > >>> Ivy Tang > > > >>> > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Best regards, > > > >> Ivy Tang > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Ivy Tang > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > Ivy Tang > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > Ivy Tang > > > > > > > > > > > > > > > > > > > -- > > Best regards, > > > > Ivy Tang > > > > > > > > -- Best regards, Ivy Tang
