Thanks Ivy. Could you put a +1 on: https://issues.apache.org/jira/browse/CHUKWA-595 Thanks
regards, Eric On Mon, Aug 8, 2011 at 8:24 PM, Ying Tang <[email protected]> wrote: > Hi Eric, > After using this patch , the chuwa in my environment has run normally > for almost 10 days . Maybe this exception is the cause of chukwa agent > crashing. And the " INFO Timer-1 HttpConnector - # http chunks ACK'ed since > last report: 0 " didn't appear. > And in my enviroment ,we do need sync the machine's time , this should be > this cause of the exception. > > On Sat, Jul 30, 2011 at 12:00 AM, Eric Yang <[email protected]> wrote: >> >> Hi Ivy, >> >> In Http Sender, there is a throttle to sleep 5 seconds, if the data >> sending frequency is high. If the lastPosted time is more recent than the >> current time, then the value becomes negative. This could happen when ntp >> makes time correction. When sleeping on negative millisecond, then it >> throws exception, and prevents agent to send data to collector. I filed a >> JIRA (CHUKWA-595) for this, thanks for catching this. :) >> >> regards, >> Eric >> >> On Jul 28, 2011, at 11:59 PM, Ying Tang wrote: >> >> > Hi Eric, >> > I meet with an exception. >> > >> > Exception in thread "HTTP post thread" >> > java.lang.IllegalArgumentException: timeout value is negative >> > at java.lang.Thread.sleep(Native Method) >> > at >> > org.apache.hadoop.chukwa.datacollection.connector.http.HttpConnector.run(HttpConnector.java:187) >> > at java.lang.Thread.run(Thread.java:636) >> > >> > And the log " INFO Timer-1 HttpConnector - # http chunks ACK'ed since >> > last report: 0 " appears again. >> > On Thu, Jul 28, 2011 at 3:23 AM, Eric Yang <[email protected]> wrote: >> > We need two numbers to track rotated logs properly without sending >> > duplicates. A number is the current file offset, and another number is the >> > incremental sequence number. The checkpoint file only preserves the >> > incremental sequence number. The current file offset is not tracked. If >> > agent crashes, agent will stream from offset 0 of the current file. Hence, >> > there maybe duplicate entries, when system recovers from agent crash. I >> > tested this, after file rotated, it is sending the new data from the >> > current >> > log file, but the file current offset is tracked in memory only. It is not >> > visible from port 9093. >> > >> > This issue is not directly related to the issue that you experienced >> > which the file tailing adaptor. From the code, it is still not clear why >> > agent crashed. However, I would recommend to use Chukwa trunk with >> > SocketAdaptor with Log4JSocketAppender. It is more reliable than UTF8 >> > adaptor. >> > >> > See: http://wiki.apache.org/hadoop/Chukwa_Quick_Start >> > >> > for installation instructions. >> > >> > regards, >> > Eric >> > >> > On Jul 26, 2011, at 10:54 PM, Ying Tang wrote: >> > >> > > And after i restart the chukwa agent , the telnet adaptor last number >> > > is still 10487067. >> > > >> > > On Wed, Jul 27, 2011 at 1:52 PM, Ying Tang <[email protected]> >> > > wrote: >> > > Does this mean the adaptor still transfer the previous log ? The >> > > current log is missing? >> > > >> > > >> > > On Wed, Jul 27, 2011 at 1:08 PM, Eric Yang <[email protected]> wrote: >> > > This looks like a bug, the last number should be in sync with the >> > > current file's size, but the UTF adaptor is still tailing the previous >> > > file (which rotated at 10487067) >> > > It means there is a bug in handling the file rotation, but the adaptor >> > > did not pick up the change. >> > > >> > > Please open a jira. Thanks >> > > >> > > regards, >> > > Eric >> > > >> > > On Tue, Jul 26, 2011 at 8:05 PM, Ying Tang <[email protected]> >> > > wrote: >> > > > The log didn't rotate very rapidly. >> > > > >> > > > Now i can't rebuild the scenario . But when the chukwa agent log >> > > > looks ok, >> > > > >> > > > 2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing >> > > > checkpoint >> > > > 1307083 >> > > > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >> > > > collected 1 >> > > > chunks for post_745 >> > > > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >> > > > >>>>>> HTTP >> > > > post_745 to http://chukwacollector1.xingcloud.com:9095/ length = >> > > > 1837 >> > > > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >> > > > >>>>>> HTTP >> > > > Got success back from >> > > > http://chukwacollector1.xingcloud.com:9095/chukwa; >> > > > response length 43 >> > > > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >> > > > post_745 >> > > > sent 0 chunks, got back 1 acks >> > > > >> > > > The list in telnet agent 9093 is: >> > > > adaptor_2963225a90653a309cf779d4a1d815a3) >> > > > >> > > > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 >> > > > Gamelog 0 /var/log/gamelog 10487067 >> > > > After several minites , the list is still >> > > > adaptor_2963225a90653a309cf779d4a1d815a3) >> > > > >> > > > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 >> > > > Gamelog 0 /var/log/gamelog 10487067 >> > > > >> > > > Is the 10487067 the offset number ?The number didn't changed , and >> > > > the log >> > > > file's size is from 0 to 10M .And now the log file's size is >> > > > 1150872. >> > > > >> > > > On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <[email protected]> >> > > > wrote: >> > > >> >> > > >> CharFileTailingAdaptorUTF should handle log rotation gracefully. >> > > >> Is the >> > > >> log rotating rapidly? >> > > >> Run those command on chukwa agent: >> > > >> telnet localhost 9093 >> > > >> list >> > > >> This should show a list of tailing files, and check the offset >> > > >> number of >> > > >> the tailing log file. The most right number should be smaller than >> > > >> the size >> > > >> of your log file. If it is bigger and not changing, it is most >> > > >> likely there >> > > >> is a bug that we haven't seen before. It might be useful to turn >> > > >> on debug >> > > >> on chukwa agent and see if this can be reproduced to nail down the >> > > >> root >> > > >> cause. Thanks >> > > >> regards, >> > > >> Eric >> > > >> On Jul 26, 2011, at 6:13 AM, Ying Tang wrote: >> > > >> >> > > >> Is there the possibility that >> > > >> when the log file reaches the log4g config file size ,the log4j >> > > >> will >> > > >> rename this log file and create a new file with this name as the >> > > >> log file . >> > > >> At the time ,the chukwa adaptor doesn't tail the log properly , and >> > > >> this >> > > >> cause the chuwa agent can't collector the log any more. >> > > >> >> > > >> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <[email protected]> >> > > >> wrote: >> > > >>> >> > > >>> The log file is log4j log file ,and the size is 10M ,the >> > > >>> maxbackupindex >> > > >>> is 1. >> > > >>> >> > > >>> >> > > >>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <[email protected]> >> > > >>> wrote: >> > > >>>> >> > > >>>> Can you run "ls -l" to show the size and dateof the log files >> > > >>>> that you >> > > >>>> are streaming? >> > > >>>> >> > > >>>> regards, >> > > >>>> Eric >> > > >>>> >> > > >>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang >> > > >>>> <[email protected]> >> > > >>>> wrote: >> > > >>>> > The chukwa version is 0.4.0 and the adaptor is >> > > >>>> > >> > > >>>> > >> > > >>>> > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 >> > > >>>> > >> > > >>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <[email protected]> >> > > >>>> > wrote: >> > > >>>> >> >> > > >>>> >> Hi Ivy, >> > > >>>> >> >> > > >>>> >> When data is send from agent to collector, collector send >> > > >>>> >> acknowledgment >> > > >>>> >> of receiving of the chunks. At 00:03:28, there are 5 chunks >> > > >>>> >> acknowledged. >> > > >>>> >> This means communication between collector and agent are >> > > >>>> >> working at >> > > >>>> >> that >> > > >>>> >> point in time. However, there is no activity after 00:04:28. >> > > >>>> >> This >> > > >>>> >> looks >> > > >>>> >> like adaptor did not handle the log rotation properly at close >> > > >>>> >> to >> > > >>>> >> midnight. >> > > >>>> >> Which version of Chukwa are you using and which adaptor are >> > > >>>> >> you >> > > >>>> >> using? >> > > >>>> >> >> > > >>>> >> regards, >> > > >>>> >> Eric >> > > >>>> >> >> > > >>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote: >> > > >>>> >> >> > > >>>> >> > Hi all, >> > > >>>> >> > >> > > >>>> >> > In my cluster , i have two chukwa agent and one collector . >> > > >>>> >> > At a time , both chukwa agents's log : >> > > >>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector - # http >> > > >>>> >> > chunks >> > > >>>> >> > ACK'ed since last report: 5 >> > > >>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector - # http >> > > >>>> >> > chunks >> > > >>>> >> > ACK'ed since last report: 0 >> > > >>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector - # http >> > > >>>> >> > chunks >> > > >>>> >> > ACK'ed since last report: 0 >> > > >>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector - # http >> > > >>>> >> > chunks >> > > >>>> >> > ACK'ed since last report: 0 >> > > >>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector - # http >> > > >>>> >> > chunks >> > > >>>> >> > ACK'ed since last report: 0 >> > > >>>> >> > >> > > >>>> >> > And the collector >> > > >>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root - >> > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >> > > >>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root - >> > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >> > > >>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root - >> > > >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >> > > >>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter - >> > > >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >> > > >>>> >> > >> > > >>>> >> > (the collector and agent has different timezone) >> > > >>>> >> > And the collector didn't collect any log. >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> > What dons the "http chunks ACK'ed since last report: 0" >> > > >>>> >> > means? >> > > >>>> >> > And from this log "http chunks ACK'ed since last report: 0" >> > > >>>> >> > appears >> > > >>>> >> > to >> > > >>>> >> > agent crash, the chukwa port still on , but after several >> > > >>>> >> > days, >> > > >>>> >> > both agents >> > > >>>> >> > crashed without exceptions. >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> > -- >> > > >>>> >> > Best regards, >> > > >>>> >> > >> > > >>>> >> > Ivy Tang >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> > >> > > >>>> >> >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>>> > -- >> > > >>>> > Best regards, >> > > >>>> > Ivy Tang >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>> >> > > >>> >> > > >>> >> > > >>> -- >> > > >>> Best regards, >> > > >>> Ivy Tang >> > > >>> >> > > >>> >> > > >> >> > > >> >> > > >> >> > > >> -- >> > > >> Best regards, >> > > >> Ivy Tang >> > > >> >> > > >> >> > > >> >> > > > >> > > > >> > > > >> > > > -- >> > > > Best regards, >> > > > Ivy Tang >> > > > >> > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Best regards, >> > > >> > > Ivy Tang >> > > >> > > >> > > >> > > >> > > >> > > >> > > -- >> > > Best regards, >> > > >> > > Ivy Tang >> > > >> > > >> > > >> > >> > >> > >> > >> > -- >> > Best regards, >> > >> > Ivy Tang >> > >> > >> > >> > > > > -- > Best regards, > Ivy Tang > > >
