MAX_READ_SIZE is a policy, and as long as it is configurable to use adaptive MAX SIZE or fixed limit. I think the new change will be better for some use cases.
regards, Eric On Tue, Apr 23, 2013 at 9:49 PM, Luangsay Sourygna <[email protected]>wrote: > Sure, we can statically increase maxReadSize in the configuration. But the > fact is that we should handle two different situations: > - when a file is growing rapidly and we want quick response for the other > files: this mean we don't want a too big maxReadSize number (I guess this > was the inital idea for this parameter). > - when a line in a file is much bigger than the other lines and its size > can be superior to the initial maxReadSize value: this means we would like > a very high maxReadSize parameter. > > Since maxReadSize can't be small and high at the same time, I propose a > "dynamic" value for this parameter. > Usually, this parameter should be small (128 kB for instance) and when an > very big line appears (when we have bufferRead == MAX_READ_SIZE AND > bytesUsed == 0), we should temporarly increase its value. Then, when the > big line is sent, get back to the initial value. > > Makes sense? > > Regards, > > Sourygna > > > > On Mon, Apr 22, 2013 at 6:25 AM, Eric Yang <[email protected]> wrote: > > > maxReadSize can be increased in the configuration. If using larger > > maxReadSize is preferred, we can update the default to be larger size. > > > > regards, > > Eric > > > > On Sun, Apr 21, 2013 at 3:07 PM, Luangsay Sourygna <[email protected] > > >wrote: > > > > > As I said before, I don't think Chukwa should handle those situations > > since > > > I think this is a "log rotation" problem. > > > Personally, I have never seen such problem (log4j RFA for instance has > a > > > kind of "flexible" size and every rotated file ended with a \n). > > > > > > On the other side, there is a special situation I think Chukwa should > > take > > > care of. > > > Default value for configuration > > > "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that > > if > > > a line/record is bigger than that size, the record won't be sent by the > > > agent. > > > We'll get a warning in the Chukwa's log, but the record will be lost > (see > > > LWFTAdaptor.slurp() method). > > > In such case, would it be possible to temporally increase MAX_READ_SIZE > > so > > > that we are able to send > > > one record on the wire? > > > > > > Regards, > > > > > > Sourygna > > > > > > > > > > > > > > > On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[email protected]> wrote: > > > > > > > Do we need to consider rotation base on size? For example the last > > line > > > of > > > > the log file that reaches 300MB. There is no line break in the first > > > file, > > > > but the entry continue to the next rotated log then have a line feed > > > > delimiter. If we are splitting line base on \n, then we can > > reconstruct > > > > the full line between two files. I am not sure if this case need to > be > > > > supported? > > > > > > > > regards, > > > > Eric > > > > > > > > > > > > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna < > > [email protected] > > > > >wrote: > > > > > > > > > Well, log4j socket adaptor may be great if you control the software > > > that > > > > > generates logs. > > > > > That is not usually my case: customers don't really like having to > > > > install > > > > > a Chukwa agents > > > > > on their production servers so I don't want to think about telling > > them > > > > to > > > > > change the log system > > > > > of their software. > > > > > > > > > > As for partial line when log files rotate, I don't think this is > > > > something > > > > > Chukwa should manage (what > > > > > is more: how could Chukwa be aware there is a problem?). > > > > > To my view, this would be an error of the "logrotate" system. As > far > > > as I > > > > > know, RFA and DRFA log4j > > > > > appenders handle quite well the rotation. > > > > > > > > > > Regards, > > > > > > > > > > Sourygna > > > > > > > > > > > > > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[email protected]> > > wrote: > > > > > > > > > > > I think the best solution is to use Log4j socket appender and > > Chukwa > > > > > log4j > > > > > > socket adaptor to get the full entry of the log without worry > about > > > > line > > > > > > feed. However, this solution only works with program that is > > written > > > > in > > > > > > Java, and does not keep a copy of existing log file on disk. > > > > > > > > > > > > I think your proposal is a good idea to solve tailing text file > and > > > > only > > > > > > line delimited entry will be send. How do we handle partial line > > and > > > > log > > > > > > file has rotated? > > > > > > > > > > > > regards, > > > > > > Eric > > > > > > > > > > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna < > > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > FileTailingAdaptor is great to tail log files and send them to > > > > Hadoop. > > > > > > > However, last line of the chunk is usually cut which leads to > > some > > > > > > errors. > > > > > > > > > > > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such > > > > > problem. > > > > > > > Nonetheless, this adaptor calls the MapProcessor.process() > method > > > for > > > > > > every > > > > > > > line in each chunk, thus slowing a lot the Demux phase. > > > > > > > > > > > > > > I suggest creating a new adaptor that would mix the benefits of > > the > > > > two > > > > > > > adaptors: the (Demux) speed of FileTailingAdaptor and > > > > > > > the preservation of lines from CharFileTailingAdaptorUTF8. > > > > > > > > > > > > > > The implementation of the extractRecords() would be: > > > > > > > - "for loop" on the buffer, starting from the end of the buffer > > and > > > > > going > > > > > > > backward > > > > > > > - if we find a separator, save the offset and exit the loop > > > > > > > - rest of method would be similar to > CharFileTailingAdaptorUTF8. > > > > > > > > > > > > > > Could you guys please tell me what do you think about it? > > > > > > > How do you currently manage the "lines cut" with Chukwa? > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Sourygna > > > > > > > > > > > > > > > > > > > > > > > > > > > >
