Hello Todd,

My aim is to make the reduce move ahead with reduction as and when it gets
the data required, instead of waiting for all the maps to complete.  If it
knows how many records it needs and compares it with number of records it
has got until now,  it can move on once they become equal without waiting
for all the maps to finish.

So if i can know the number of records received from each file the
MapOutputCopier has copied, then i can do this comparison. But now , the 
lengths received by the copier  ( present in these variables ...

long decompressedLength =
Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH));
long compressedLength =
Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH));  )

are not even multiples of the record size . I want to know how to get the
number of records from these lengths.

Iam doing all these as an experiment in my graduate research.. and if
everything works.. i can come up with a contrib file for the same.

Thanks,
Naresh Rapolu. 

Naresh Rapolu wrote:
> 
> Hello,
> 
> In  ReduceTask.java ,  MapOutputCopier# getMapOutput()   function.
> What do the following variables contain.
> 
> long decompressedLength =
> Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH));
> long compressedLength =
> Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH));
> 
> Can i get the number of  map output records  in this  copied file using
> any of these variables ?? 
> None of these seems to be a multiple of  Record<K,V>  size.  I understand
> there might be some header information and checksum content inside these
> length, but can any one let me know how should i subtract them to get the
> aggregate  size of  map-output-records.
> 
> Thanks,
> Naresh Rapolu.
> 

-- 
View this message in context: 
http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24360429.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Reply via email to