Hello Todd, My aim is to make the reduce move ahead with reduction as and when it gets the data required, instead of waiting for all the maps to complete. If it knows how many records it needs and compares it with number of records it has got until now, it can move on once they become equal without waiting for all the maps to finish.
So if i can know the number of records received from each file the MapOutputCopier has copied, then i can do this comparison. But now , the lengths received by the copier ( present in these variables ... long decompressedLength = Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH)); long compressedLength = Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH)); ) are not even multiples of the record size . I want to know how to get the number of records from these lengths. Iam doing all these as an experiment in my graduate research.. and if everything works.. i can come up with a contrib file for the same. Thanks, Naresh Rapolu. Naresh Rapolu wrote: > > Hello, > > In ReduceTask.java , MapOutputCopier# getMapOutput() function. > What do the following variables contain. > > long decompressedLength = > Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH)); > long compressedLength = > Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH)); > > Can i get the number of map output records in this copied file using > any of these variables ?? > None of these seems to be a multiple of Record<K,V> size. I understand > there might be some header information and checksum content inside these > length, but can any one let me know how should i subtract them to get the > aggregate size of map-output-records. > > Thanks, > Naresh Rapolu. > -- View this message in context: http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24360429.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.