Hello, I need to write a mapreduce program that begins with 2 jobs: 1. Convert raw log data to SequenceFiles 2. Read from SequenceFiles, and cherry pick completed events (otherwise, keep them as SequenceFiles to be checked again later) But I should be able to compact those 2 jobs into 1 job.
I just need to figure out how to write an InputFormat that uses 2 types of RecordReaders, depending on the input file type. Specifically, the inputs would be either raw log data (TextInputFormat), or partially processed log data (SequenceFileInputFormat). I think I need to extend SequenceFileInputFormat to look for an identifying extension on the files. Then I would be able to return either a LineRecordReader or a SequenceFileRecordReader, and some logic in Map could process the line into a record. Am I headed in the right direction? Or should I stick with running 2 jobs instead of trying to squash these steps into 1? Thanks, Stu Hood Webmail.us "You manage your business. We'll manage your email."®
