Sure, we can statically increase maxReadSize in the configuration. But the fact is that we should handle two different situations: - when a file is growing rapidly and we want quick response for the other files: this mean we don't want a too big maxReadSize number (I guess this was the inital idea for this parameter). - when a line in a file is much bigger than the other lines and its size can be superior to the initial maxReadSize value: this means we would like a very high maxReadSize parameter.
Since maxReadSize can't be small and high at the same time, I propose a "dynamic" value for this parameter. Usually, this parameter should be small (128 kB for instance) and when an very big line appears (when we have bufferRead == MAX_READ_SIZE AND bytesUsed == 0), we should temporarly increase its value. Then, when the big line is sent, get back to the initial value. Makes sense? Regards, Sourygna On Mon, Apr 22, 2013 at 6:25 AM, Eric Yang <[email protected]> wrote: > maxReadSize can be increased in the configuration. If using larger > maxReadSize is preferred, we can update the default to be larger size. > > regards, > Eric > > On Sun, Apr 21, 2013 at 3:07 PM, Luangsay Sourygna <[email protected] > >wrote: > > > As I said before, I don't think Chukwa should handle those situations > since > > I think this is a "log rotation" problem. > > Personally, I have never seen such problem (log4j RFA for instance has a > > kind of "flexible" size and every rotated file ended with a \n). > > > > On the other side, there is a special situation I think Chukwa should > take > > care of. > > Default value for configuration > > "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that > if > > a line/record is bigger than that size, the record won't be sent by the > > agent. > > We'll get a warning in the Chukwa's log, but the record will be lost (see > > LWFTAdaptor.slurp() method). > > In such case, would it be possible to temporally increase MAX_READ_SIZE > so > > that we are able to send > > one record on the wire? > > > > Regards, > > > > Sourygna > > > > > > > > > > On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[email protected]> wrote: > > > > > Do we need to consider rotation base on size? For example the last > line > > of > > > the log file that reaches 300MB. There is no line break in the first > > file, > > > but the entry continue to the next rotated log then have a line feed > > > delimiter. If we are splitting line base on \n, then we can > reconstruct > > > the full line between two files. I am not sure if this case need to be > > > supported? > > > > > > regards, > > > Eric > > > > > > > > > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna < > [email protected] > > > >wrote: > > > > > > > Well, log4j socket adaptor may be great if you control the software > > that > > > > generates logs. > > > > That is not usually my case: customers don't really like having to > > > install > > > > a Chukwa agents > > > > on their production servers so I don't want to think about telling > them > > > to > > > > change the log system > > > > of their software. > > > > > > > > As for partial line when log files rotate, I don't think this is > > > something > > > > Chukwa should manage (what > > > > is more: how could Chukwa be aware there is a problem?). > > > > To my view, this would be an error of the "logrotate" system. As far > > as I > > > > know, RFA and DRFA log4j > > > > appenders handle quite well the rotation. > > > > > > > > Regards, > > > > > > > > Sourygna > > > > > > > > > > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[email protected]> > wrote: > > > > > > > > > I think the best solution is to use Log4j socket appender and > Chukwa > > > > log4j > > > > > socket adaptor to get the full entry of the log without worry about > > > line > > > > > feed. However, this solution only works with program that is > written > > > in > > > > > Java, and does not keep a copy of existing log file on disk. > > > > > > > > > > I think your proposal is a good idea to solve tailing text file and > > > only > > > > > line delimited entry will be send. How do we handle partial line > and > > > log > > > > > file has rotated? > > > > > > > > > > regards, > > > > > Eric > > > > > > > > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna < > > > [email protected] > > > > > >wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > FileTailingAdaptor is great to tail log files and send them to > > > Hadoop. > > > > > > However, last line of the chunk is usually cut which leads to > some > > > > > errors. > > > > > > > > > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such > > > > problem. > > > > > > Nonetheless, this adaptor calls the MapProcessor.process() method > > for > > > > > every > > > > > > line in each chunk, thus slowing a lot the Demux phase. > > > > > > > > > > > > I suggest creating a new adaptor that would mix the benefits of > the > > > two > > > > > > adaptors: the (Demux) speed of FileTailingAdaptor and > > > > > > the preservation of lines from CharFileTailingAdaptorUTF8. > > > > > > > > > > > > The implementation of the extractRecords() would be: > > > > > > - "for loop" on the buffer, starting from the end of the buffer > and > > > > going > > > > > > backward > > > > > > - if we find a separator, save the offset and exit the loop > > > > > > - rest of method would be similar to CharFileTailingAdaptorUTF8. > > > > > > > > > > > > Could you guys please tell me what do you think about it? > > > > > > How do you currently manage the "lines cut" with Chukwa? > > > > > > > > > > > > Regards, > > > > > > > > > > > > Sourygna > > > > > > > > > > > > > > > > > > > > >
