1.  Given that the instances will run on EC2 instance storage, I woiuld guess 
Log4J would probably offer better performance. Let me try that. 

2.  I did take a look at the appender. Ok, I will adapt the appender with a 
simple string tokenizer to split the meta data and the message body. That will 
do the trick for now. 

Thanks. Will update based on what I see. 

Srini
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Eric Sammer <esam...@cloudera.com>
Date: Thu, 1 Dec 2011 18:58:13 
To: <flume-user@incubator.apache.org>
Subject: Re: Log4J appender

On Wed, Nov 30, 2011 at 8:36 PM, Srinivasan Subramanian 
<ssrini_va...@hotmail.com <mailto:ssrini_va...@hotmail.com> > wrote:

 

 Hi Eric

Thanks for that.  I will look at integrating Log4J appender for flume for sure. 
 Couple of additional questions.


1. From a performance standpoint, does Log4J appender have any significant 
advantages over tailing the log file? 


The log4j appender should be more reliable and safer to use than tail as it 
communicates directly via RPC with well defined semantics. The tail source has 
some issues with race conditions around quickly truncated files and failure 
recovery. With respect to performance, they're probably close but it's hard to 
say. Tail requires disk IO which can be slow but the log4j appender uses Avro 
rpc which isn't blazing fast either. 
 


2. It would be ideal if the Log4J appender also allows to put in some meta data 
that I need to use for output bucketing.  Any ideas how that can be achieved? 


I don't believe there's any way to inject metadata into the event generated by 
the appender. Someone did some work to make the log4j appender understand the 
MDC / NDC stuff (that I know very little about) but I never had time to review 
/ integrate the patch, sadly. You should just take a look at the appender 
source; it's really simple. 
 



Regards
Srini






----------------
Date: Wed, 30 Nov 2011 10:28:41 -0800
 Subject: Re: Log4J appender
From: esam...@cloudera.com <mailto:esam...@cloudera.com> 
To: flume-user@incubator.apache.org <mailto:flume-user@incubator.apache.org> 
 



Srini:

On Wed, Nov 30, 2011 at 12:23 AM, Srinivasan Subramanian 
<ssrini_va...@hotmail.com <mailto:ssrini_va...@hotmail.com> > wrote:
 
 

 I was evaluating the log4j appender provided with Flume.  But there is one 
aspect I dont understand:


The log4j appender makes a connection to teh flume-agent and retries a maximum 
of 10 times (default - configurable) if the connection is not made 
successfully.   


Questions:


1. When will the connection fail?  If the agent is not running on the node?  In 
that case given that the default implementation waits for 1 second before each 
retry for a total of 10 retries, would this mean that each logging call from 
the application would be delayed by 10 seconds?  That would affect performance 
right?   


Almost certainly, yes, assuming log4j is synchronous (I'm 99.9% sure it is). Of 
course, synchronous logging is the only way to guarantee event delivery in this 
context; if the application were to log the event and move on without waiting 
for a response an event could get dropped and no one would be responsible for 
retrying the send. 
 




2. What happens to the log message when the agent is not available?  Is it 
lost? 


If the log4j appender runs out of retries I believe I wrote it to throw an 
exception. This would be the equivalent of using a standard file appender and 
running out of disk space. In other words, the log call failed and should be 
handled by the application. 


Let me know if you have any other questions!





 
I am a little confused with the implementation and any help in explaining this 
is appreciated.


Regards
Srini


                                         



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com <http://www.cloudera.com> 
                                         



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com <http://www.cloudera.com>

Reply via email to