Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by Jerome Boulon: http://wiki.apache.org/hadoop/Sending_information_to_Chukwa ------------------------------------------------------------------------------ == Add a new dataSource (Source Input) == === Using Log4J === - Chukwa comes with a Log4J Appender. Here the steps that you need to fallow in order to use it: + Chukwa comes with a Log4J Appender. Here the steps that you need to follow in order to use it: - 1. Create a log4j.properties file that contains the fallowing information: + 1. Create a log4j.properties file that contains the following information: log4j.rootLogger=INFO, chukwa log4j.appender.chukwa=org.apache.hadoop.chukwa.inputtools.log4j.ChukwaDailyRollingFileAppender @@ -20, +20 @@ log4j.appender.chukwa.layout=org.apache.log4j.PatternLayout log4j.appender.chukwa.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n - 1. Add those parameters to your java command line: + 1. Add these parameters to your java command line: * -DCHUKWA_HOME=${CHUKWA_HOME} -DRECORD_TYPE=<YourRecordType_Here> -Dlog4j.configuration=log4j.properties * -DRECORD_TYPE=<YourRecordType_Here> is the most important parameter. - * You can only store one record type per file, so if you need to split your logs into different record types,just create one appender per data type (%T% see hadoop logs4j configuration file) + * You can only store one record type per file, so if you need to split your logs into different record types just create one appender per data type (%T% see hadoop logs4j configuration file) - 1. Start your program, now all you log statements should be written in ${CHUKWA_HOME}/logs/<YourRecordType_Here>.log + 1. Start your program. Now all your log statements should be written in ${CHUKWA_HOME}/logs/<YourRecordType_Here>.log === Static file like /var/log/messages === @@ -39, +39 @@ 1. Open a socket from your application to the ChukwaLocalAgent 1. Write this line to the socket - * add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLine <RecordType> <StartOffset> <fileName> <StartOffset> + * Add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLine <RecordType> <StartOffset> <fileName> <StartOffset> * Where <RecordType> is the data type that will identify your data * Where <StartOffset> is the start offset * Where <fileName> is the local path on your machine - 1. close the socket + 1. Close the socket == Extract information from this new dataSource == @@ -58, +58 @@ Your log will be automatically available from the Web Log viewer under the <YourRecordTypeHere> directory === Using a specific Parser === - If you want to extract some specific information and more processing you need to write your own parser. + If you want to extract some specific information and perform more processing you need to write your own parser. Like any M/R program, your have to write at least the Map side for your parser. The reduce side is Identity by default. ==== MAP side of the parser ==== - Your can either write your own from strach or extends the AbstractProcessor class that hide all the low level action on the chunk. + Your can write your own parser from scratch or extend the AbstractProcessor class that hides all the low level action on the chunk. - then you have to register your parser to the demux (link between the RecordType and the parser) + Then you have to register your parser to the demux (link between the RecordType and the parser) ==== Parser registration ==== - * Edit ${CHUKWA_HOME}/conf/chukwa-demux-conf.xml and add the fallowing lines + * Edit ${CHUKWA_HOME}/conf/chukwa-demux-conf.xml and add the following lines <property> <name><YourRecordType_Here></name> @@ -147, +147 @@ ==== Parser key field ==== Your data is going to be sorted by RecordType then by the key field. - The default implementation use the fallowing grouping for all records: + The default implementation use the following grouping for all records: 1. Time partition (Time up to the hour) 1. Machine name (physical input source) 1. Record timestamp
