Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines

Luangsay Sourygna Tue, 23 Apr 2013 21:49:58 -0700

Sure, we can statically increase maxReadSize in the configuration. But the
fact is that we should handle two different situations:
- when a file is growing rapidly and we want quick response for the other
files: this mean we don't want a too big maxReadSize number (I guess this
was the inital idea for this parameter).
- when a line in a file is much bigger than the other lines and its size
can be superior to the initial maxReadSize value: this means we would like
a very high maxReadSize parameter.


Since maxReadSize can't be small and high at the same time, I propose a
"dynamic" value for this parameter.
Usually, this parameter should be small (128 kB for instance) and when an
very big line appears (when we have bufferRead == MAX_READ_SIZE AND
bytesUsed == 0), we should temporarly increase its value. Then, when the
big line is sent, get back to the initial value.

Makes sense?

Regards,

Sourygna



On Mon, Apr 22, 2013 at 6:25 AM, Eric Yang <[email protected]> wrote:

> maxReadSize can be increased in the configuration.  If using larger
> maxReadSize is preferred, we can update the default to be larger size.
>
> regards,
> Eric
>
> On Sun, Apr 21, 2013 at 3:07 PM, Luangsay Sourygna <[email protected]
> >wrote:
>
> > As I said before, I don't think Chukwa should handle those situations
> since
> > I think this is a "log rotation" problem.
> > Personally, I have never seen such problem (log4j RFA for instance has a
> > kind of "flexible" size and every rotated file ended with a \n).
> >
> > On the other side, there is a special situation I think Chukwa should
> take
> > care of.
> > Default value for configuration
> > "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that
> if
> > a line/record is bigger than that size, the record won't be sent by the
> > agent.
> > We'll get a warning in the Chukwa's log, but the record will be lost (see
> > LWFTAdaptor.slurp() method).
> > In such case, would it be possible to temporally increase MAX_READ_SIZE
> so
> > that we are able to send
> > one record on the wire?
> >
> > Regards,
> >
> > Sourygna
> >
> >
> >
> >
> > On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[email protected]> wrote:
> >
> > > Do we need to consider rotation base on size?  For example the last
> line
> > of
> > > the log file that reaches 300MB.  There is no line break in the first
> > file,
> > > but the entry continue to the next rotated log then have a line feed
> > > delimiter.  If we are splitting line base on \n, then we can
> reconstruct
> > > the full line between two files. I am not sure if this case need to be
> > > supported?
> > >
> > > regards,
> > > Eric
> > >
> > >
> > > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna <
> [email protected]
> > > >wrote:
> > >
> > > > Well, log4j socket adaptor may be great if you control the software
> > that
> > > > generates logs.
> > > > That is not usually my case: customers don't really like having to
> > > install
> > > > a Chukwa agents
> > > > on their production servers so I don't want to think about telling
> them
> > > to
> > > > change the log system
> > > > of their software.
> > > >
> > > > As for partial line when log files rotate, I don't think this is
> > > something
> > > > Chukwa should manage (what
> > > > is more: how could Chukwa be aware there is a problem?).
> > > > To my view, this would be an error of the "logrotate" system. As far
> > as I
> > > > know, RFA and DRFA log4j
> > > > appenders handle quite well the rotation.
> > > >
> > > > Regards,
> > > >
> > > > Sourygna
> > > >
> > > >
> > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[email protected]>
> wrote:
> > > >
> > > > > I think the best solution is to use Log4j socket appender and
> Chukwa
> > > > log4j
> > > > > socket adaptor to get the full entry of the log without worry about
> > > line
> > > > > feed.  However, this solution only works with program that is
> written
> > > in
> > > > > Java, and does not keep a copy of existing log file on disk.
> > > > >
> > > > > I think your proposal is a good idea to solve tailing text file and
> > > only
> > > > > line delimited entry will be send.  How do we handle partial line
> and
> > > log
> > > > > file has rotated?
> > > > >
> > > > > regards,
> > > > > Eric
> > > > >
> > > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna <
> > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > FileTailingAdaptor is great to tail log files and send them to
> > > Hadoop.
> > > > > > However, last line of the chunk is usually cut which leads to
> some
> > > > > errors.
> > > > > >
> > > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such
> > > > problem.
> > > > > > Nonetheless, this adaptor calls the MapProcessor.process() method
> > for
> > > > > every
> > > > > > line in each chunk, thus slowing a lot the Demux phase.
> > > > > >
> > > > > > I suggest creating a new adaptor that would mix the benefits of
> the
> > > two
> > > > > > adaptors: the (Demux) speed of FileTailingAdaptor and
> > > > > > the preservation of lines from CharFileTailingAdaptorUTF8.
> > > > > >
> > > > > > The implementation of the extractRecords() would be:
> > > > > > - "for loop" on the buffer, starting from the end of the buffer
> and
> > > > going
> > > > > > backward
> > > > > > - if we find a separator, save the offset and exit the loop
> > > > > > - rest of method would be similar to CharFileTailingAdaptorUTF8.
> > > > > >
> > > > > > Could you guys please tell me what do you think about it?
> > > > > > How do you currently manage the "lines cut" with Chukwa?
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Sourygna
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines

Reply via email to