Hi, Since MultipleOutputs is not supported in version 0.20.203, so while using Partitioner class, key-value pairs belonging to partition 1 may end up in file part-r-00000 or part-r-00002. So, to handle this, I am currently *prefixing all the records* in a file with a "*partition number*". So, lets say 4 files gets created on HDFS as follows:
part-r-00000: lets say it contains all records for partition 2 part-r-00001: lets say it contains all records for partition 1 part-r-00002: lets say it contains all records for partition 3 part-r-00003: lets say it contains all records for partition 0 Now, I am creating a new command to append all these files into a single file on the local file system based on "*increasing order of partition number*". While doing this, I have to remove the partition number from all the records. I can do it by reading all the files line by line and then using substring, can extract the required data and put it in the o/p file. But, this approach will take too much time as this functionality is intended to be run on very huge files (GBs in size). So, can you please suggest if there can be an alternative way to implement this functionality so as to get it done in minimum time. -- Regards, Piyush Kansal