As I said before, I don't think Chukwa should handle those situations since I think this is a "log rotation" problem. Personally, I have never seen such problem (log4j RFA for instance has a kind of "flexible" size and every rotated file ended with a \n).
On the other side, there is a special situation I think Chukwa should take care of. Default value for configuration "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that if a line/record is bigger than that size, the record won't be sent by the agent. We'll get a warning in the Chukwa's log, but the record will be lost (see LWFTAdaptor.slurp() method). In such case, would it be possible to temporally increase MAX_READ_SIZE so that we are able to send one record on the wire? Regards, Sourygna On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[email protected]> wrote: > Do we need to consider rotation base on size? For example the last line of > the log file that reaches 300MB. There is no line break in the first file, > but the entry continue to the next rotated log then have a line feed > delimiter. If we are splitting line base on \n, then we can reconstruct > the full line between two files. I am not sure if this case need to be > supported? > > regards, > Eric > > > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna <[email protected] > >wrote: > > > Well, log4j socket adaptor may be great if you control the software that > > generates logs. > > That is not usually my case: customers don't really like having to > install > > a Chukwa agents > > on their production servers so I don't want to think about telling them > to > > change the log system > > of their software. > > > > As for partial line when log files rotate, I don't think this is > something > > Chukwa should manage (what > > is more: how could Chukwa be aware there is a problem?). > > To my view, this would be an error of the "logrotate" system. As far as I > > know, RFA and DRFA log4j > > appenders handle quite well the rotation. > > > > Regards, > > > > Sourygna > > > > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[email protected]> wrote: > > > > > I think the best solution is to use Log4j socket appender and Chukwa > > log4j > > > socket adaptor to get the full entry of the log without worry about > line > > > feed. However, this solution only works with program that is written > in > > > Java, and does not keep a copy of existing log file on disk. > > > > > > I think your proposal is a good idea to solve tailing text file and > only > > > line delimited entry will be send. How do we handle partial line and > log > > > file has rotated? > > > > > > regards, > > > Eric > > > > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna < > [email protected] > > > >wrote: > > > > > > > Hi all, > > > > > > > > FileTailingAdaptor is great to tail log files and send them to > Hadoop. > > > > However, last line of the chunk is usually cut which leads to some > > > errors. > > > > > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such > > problem. > > > > Nonetheless, this adaptor calls the MapProcessor.process() method for > > > every > > > > line in each chunk, thus slowing a lot the Demux phase. > > > > > > > > I suggest creating a new adaptor that would mix the benefits of the > two > > > > adaptors: the (Demux) speed of FileTailingAdaptor and > > > > the preservation of lines from CharFileTailingAdaptorUTF8. > > > > > > > > The implementation of the extractRecords() would be: > > > > - "for loop" on the buffer, starting from the end of the buffer and > > going > > > > backward > > > > - if we find a separator, save the offset and exit the loop > > > > - rest of method would be similar to CharFileTailingAdaptorUTF8. > > > > > > > > Could you guys please tell me what do you think about it? > > > > How do you currently manage the "lines cut" with Chukwa? > > > > > > > > Regards, > > > > > > > > Sourygna > > > > > > > > > >
