Chukwa use case is probably not effected by the decision on MAPREDUCE-1126. Chukwa key is composed of Long (time partition), String (primary key), Long (timestamp). The value is composed of Avro blob. I like to try out using Avro to serialize the comparator, but it makes no difference in Chukwa use case because it is likely that I have to write my own comparator to begin with for Tfile. I agree with Chris Douglas and Time White said, the Avro serializing comparator should be optional.
I like Tim's example: Schema keySchema = ... AvroGenericData.setMapOutputKeySchema(job, keySchema); Hope this helps. Regards, Eric On 3/16/10 2:56 PM, "Jeff Hammerbacher" <ham...@cloudera.com> wrote: > Hey Eric, > > Could you chime in on MAPREDUCE-815 with your potential use case? We're > currently blocked on other issues, but getting more use cases on the table > will be helpful. > > Thanks, > Jeff > > On Mon, Mar 15, 2010 at 7:41 PM, Eric Yang <ey...@yahoo-inc.com> wrote: >> Hi Kirk, >> >> The Avro + Tfile plan depends on >> https://issues.apache.org/jira/browse/MAPREDUCE-815. The work can start >> once Avro Input/Out format patch is included in a release build of Hadoop. >> Hence, I would project to complete this migration would be at least six >> months from Avro Mapreduce ready. It's a fair big chunk of work, and it >> would be great if people want to pitch in to build aggregator piece to >> control the workflow. See https://issues.apache.org/jira/browse/CHUKWA-444 >> for reference. >> >> Regards, >> Eric >> >> On 3/15/10 3:03 PM, "Kirk True" <k...@mustardgrain.com> wrote: >> >>> Hi Eric, >>> >>> Any notion as to the ETA for completion of the migration? >>> >>> Thanks, >>> Kirk >>> >>> Eric Yang wrote: >>>> >>>> Hi Kirk, >>>> >>>> I am working on a design which removes MySQL from Chukwa. I am making this >>>> departure from MySQL because MDL framework was for prototype purpose. It >>>> will not scale in production system where Chukwa could be host on large >>>> hadoop cluster. HICC will serve data directly from HDFS in the future. >>>> >>>> Meanwhile, the dbAdmin.sh from Chukwa 0.3 is still compatible with trunk >>>> version of Chukwa. You can load ChukwaRecords using >>>> org.apache.hadoop.chukwa.dataloader.MetricDataLoader class or mdl.sh from >>>> Chukwa 0.3. >>>> >>>> MetricDataLoader class will be mark as deprecated, and it will not be >>>> supported once we make transition to Avro + Tfile. >>>> >>>> Regards, >>>> Eric >>>> >>>> On 3/15/10 11:56 AM, "Kirk True" <k...@mustardgrain.com> >>>> <mailto:k...@mustardgrain.com> wrote: >>>> >>>> >>>> >>>>> >>>>> Hi all, >>>>> >>>>> I recently switched to trunk as I was experiencing a lot of issues with >>>>> 0.3.0. In 0.3.0, there was a dbAdmin.sh script that would run and try to >>>>> stick data in MySQL from HDFS. However, that script is gone and when I >>>>> run the system as built from trunk, nothing is ever populated in the >>>>> database. Where are the instructions for setting up the HDFS -> MySQL >>>>> data migration for HICC? >>>>> >>>>> Thanks, >>>>> Kirk >>>>> >>>>> >>>> >>>> >>>> >>> >> > >