Hi dev (especially BSP core committers and PMCers), First of all, the input re-partitioning is very important and unavoidable part of Apache Hama. Since there are still people who say "as if everything can be settled by Spilling Queue with something" or "It should be also able to solve for the large input without large cluster", let me explain again.
Restricting the number of Task processors to the number of block files of input, means that both below situations are problematic: Case 1. User want to process 1GB input with 1,000 tasks on large cluster. Case 2. User want to process 10GB input with 3 tasks on small cluster. I believe this part has higher priority than other issues, such as VertexInputReader, Spilling Queue. Hence, please don't mix everything here, when we talking about this in the future. To re-partitioning raw data and create partitions as desired, currently we have a PartitioningJobRunner. So, before working on future projects, please test with various scenarios, for example, whether it works well with compressed files, latest Hadoop (HDFS 2.0), or on large cluster. Second, is a lack of active discussion on RoadMap, and a difference of opinion on release. There's a limit as to what we can do. Moreover, as I mentioned above, there're many high priority issues. I don't understand why you need to develop BSP core or create separate branches without working together on basis issues. Of course, research tasks are fine. If you want to work on them in your free time, then feel free to do so. But please don't block other developments. I hope you understand my meaning. Thanks. -- Best Regards, Edward J. Yoon @eddieyoon
