It doesn't look like you are splitting record in the mapper phase to reducer
type ActionLogAggregateWeights.  The current demux is partitioned by the
reducer record type.  Hence, if the record is split in the reducer phase, it
will not work.  Take a look at Top mapper class.  It is calling
buildGenericRecord to partition reducer type.  ActionLog mapper should
mirror the data and send to both ActionLog and ActionLogAggregateWeights
reducer class.  Hope this helps.

Note, Reducer partition by RecordType is not correctly implemented in the
current demux.  Chukwa requires single reducer per data type to run
correctly.  If a single record type generates large amount of data, the
reducer for the large record type become the bottle neck of demux.  Hence,
Demux is going to change when Avro Input/Output format is ready.  I am not
sure if it may impact your implementation but something to keep in mind.

Regards,
Eric

On 3/8/10 7:05 AM, "Guillermo Pérez" <bi...@tuenti.com> wrote:

> I'm launching several chukwa records with different keys and reducers,
> so I can generate some aggregated records data directly while loading
> data.
> 
> The map / redux works well, but the data that passes to the aggregator
> reducer is not stored to HDFS. Anybody knows why?
> 
> 2010-03-08 15:28:28,784 INFO main JobClient - Job complete:
> job_201002191418_1463
> 2010-03-08 15:28:28,800 INFO main JobClient - Counters: 29
> 2010-03-08 15:28:28,800 INFO main JobClient -   DemuxReduceOutput
> 2010-03-08 15:28:28,800 INFO main JobClient -     total records=1416018
> 2010-03-08 15:28:28,800 INFO main JobClient -     ActionLog records=1416018
> 2010-03-08 15:28:28,800 INFO main JobClient -   DemuxMapOutput
> 2010-03-08 15:28:28,800 INFO main JobClient -
> ActionLogAggregateWeights records=1416018
> 2010-03-08 15:28:28,800 INFO main JobClient -     total records=2832036
> 2010-03-08 15:28:28,800 INFO main JobClient -     ActionLog records=1416018
> 2010-03-08 15:28:28,801 INFO main JobClient -   Job Counters
> 2010-03-08 15:28:28,801 INFO main JobClient -     Launched reduce tasks=9
> 2010-03-08 15:28:28,801 INFO main JobClient -     Rack-local map tasks=1
> 2010-03-08 15:28:28,801 INFO main JobClient -     Launched map tasks=2
> 2010-03-08 15:28:28,801 INFO main JobClient -     Data-local map tasks=1
> 2010-03-08 15:28:28,801 INFO main JobClient -   DemuxMapInput
> 2010-03-08 15:28:28,801 INFO main JobClient -     ActionLog chunks=610
> 2010-03-08 15:28:28,801 INFO main JobClient -     total chunks=610
> 2010-03-08 15:28:28,801 INFO main JobClient -   DemuxReduceInput
> 2010-03-08 15:28:28,801 INFO main JobClient -     total distinct keys=58400
> 2010-03-08 15:28:28,801 INFO main JobClient -     ActionLog total
> distinct keys=57600
> 2010-03-08 15:28:28,801 INFO main JobClient -
> ActionLogAggregateWeights total distinct keys=800
> 2010-03-08 15:28:28,801 INFO main JobClient -   FileSystemCounters
> 2010-03-08 15:28:28,801 INFO main JobClient -     FILE_BYTES_READ=1001600403
> 2010-03-08 15:28:28,802 INFO main JobClient -     HDFS_BYTES_READ=85558794
> 2010-03-08 15:28:28,802 INFO main JobClient -
> FILE_BYTES_WRITTEN=1501914817
> 2010-03-08 15:28:28,802 INFO main JobClient -     HDFS_BYTES_WRITTEN=325688807
> 2010-03-08 15:28:28,802 INFO main JobClient -   Map-Reduce Framework
> 2010-03-08 15:28:28,802 INFO main JobClient -     Reduce input groups=58400
> 2010-03-08 15:28:28,802 INFO main JobClient -     Combine output records=0
> 2010-03-08 15:28:28,802 INFO main JobClient -     Map input records=610
> 2010-03-08 15:28:28,802 INFO main JobClient -     Reduce shuffle
> bytes=342907105
> 2010-03-08 15:28:28,802 INFO main JobClient -     Reduce output
> records=1416018
> 2010-03-08 15:28:28,802 INFO main JobClient -     Spilled Records=8496108
> 2010-03-08 15:28:28,802 INFO main JobClient -     Map output bytes=493557805
> 2010-03-08 15:28:28,802 INFO main JobClient -     Map input bytes=85558588
> 2010-03-08 15:28:28,802 INFO main JobClient -     Combine input records=0
> 2010-03-08 15:28:28,802 INFO main JobClient -     Map output records=2832036
> 2010-03-08 15:28:28,802 INFO main JobClient -     Reduce input records=2832036
> 
> ActionLog is stored in the repository dir, but I can't find anything
> about ActionLogAggregateWeights...
> 

Reply via email to