Hi Eric, > Chukwa has a special log4j appender which escapes return character. The > multi-lines exception will be stored as a single chunk, and processed as a > single chukwa record after Demux.
In this case, I suppose I would need to configure the monitored Hadoop cluster to actually use the Chukwa log4j appender? Would I also need to recompile the Hadoop of the monitored cluster to include the Chukwa code then? > > You are on the right track. For your purpose, you may want to create your > own custom datatype and having a matched chukwa log4j appender record type > to process the data that you are looking for. To start, you may be > interested in modifying HadoopLogProcessor, and enhance from there. Chukwa > is currently streaming all hadoop logs in the same record type (HadoopLog), > and this part could use some help to carve out the definitions. Where are these record types defined, and how do they map the the processors? Is it a direct <record type name>Processor mapping that's automatically done by the Demux? Thanks, Jiaqi > > On 5/17/09 6:42 PM, "Jiaqi Tan" <[email protected]> wrote: > >> Hi Ariel, >> >> So with the CharFileTailingAdaptorUTF8NewLineEscaped, if I have a log >> file entry with a multi-line entry, e.g. if there was a Java exception >> logged, would each line be separated into a different chunk? If that's >> the case, are there any adaptors that would coalesce multi-line log >> entries into a single chunk? >> >> Also, does the data type get resolved by Demux to one of the classes >> in org.apache.hadoop.chukwa.extraction.demux.processor.mapper? i.e. if >> I wanted to implement my own custom datatype, I should create a Demux >> processor and stick it in as one of the classes in that package? >> >> Thanks, >> Jiaqi >> >> On Sun, May 17, 2009 at 6:19 PM, Ariel Rabkin <[email protected]> wrote: >>> It's worth distinguishing two different things. >>> >>> The adaptor (as in CharFileTailingAdaptorUTF8) is responsible for >>> deciding how to break the data into chunks, and how to tag the chunks. >>> Probably CharFileTailingAdaptorUTF8NewLineEscaped is right for you. >>> (We should really rename that to something shorter!) >>> >>> The type, like SysLog or NameNodeLog, is stored by the adaptor, and >>> passed through as Chunk metadata. It's used to tell the Demux how to >>> process that data. The demux-conf has the mapping from datatype to >>> processor. For logs, you should be fine just picking a datatype. If >>> you aren't using Demux to process the logs, you don't even need to >>> write a processor. >>> >>> --Ari >>> >>> On Sun, May 17, 2009 at 6:15 PM, Jiaqi Tan <[email protected]> wrote: >>>> Hi, >>>> >>>> Which adaptor should I use if I want to process log entries from the >>>> TaskTracker and DataNode logs? Should I just use one of the >>>> FileTailer adaptors already available (CharFileTailingAdaptorUTF8), or >>>> is there a custom type such as the one for SysLog or NameNodeLog when >>>> using the CharFileTailingAdaptorUTF8NewLineEscaped adaptor? >>>> >>>> Is there any documentation available on what the "type" (e.g. SysLog >>>> or NameNodeLog) means and how to use it/how it works? >>>> >>>> Thanks, >>>> Jiaqi >>>> >>> >>> >>> >>> -- >>> Ari Rabkin [email protected] >>> UC Berkeley Computer Science Department >>> > >
