I think some features are very useful for us: 1. Multi-key types supported in input. for example: SEQ file A is <Ka, Va> pair, and SEQ file B is <Kb, Vb> pair. I can simply add both of these files as input file, and the map funtion could be map(Object, Ojbect). By this way, i don't have to wrap Ka and Kb into ObjectWritable, and the program will be more readable.
2. Value comparator supported. There is key comparator supported in current hadoop, and by this way, i can specify the order the key in reduce phase. But sometimes, i also need specify the order the value sequence in reduce phase. For example, values in reduce phase consist of Shop and Goods, and i want to the Shop object always be the 1st object in the values because the output needs shop infor. Currently i have to store the Goods Info in a buffer until the Shop object has been found. 3. More effective "ObjectWritable". Look at the ObjectWritable's implementation, the class type information is always written into sequence file. But in many cases, both of key and value are pretty small, the class type information is even much larger than key& value themselves. 4. Compression supported. Sequence file contains a lot of similar data, if it could be compressed before it is really written into disk, a lot of time will be saved. For example, if the value type is ObjectWritable, there must be a lot of class declaration information could be compressed. In my experience, 20% bandwidth and disk space will be saved. Some other feature requests, i will say here. Best regards, Feng Jiang
