We've had discussions on the same many times. "But please don't block other developments" - I want to understand where the development is blocked especially for partitioning.
-Suraj On Sun, May 12, 2013 at 6:54 AM, Edward J. Yoon <[email protected]>wrote: > Hi dev (especially BSP core committers and PMCers), > > First of all, the input re-partitioning is very important and > unavoidable part of Apache Hama. Since there are still people who say > "as if everything can be settled by Spilling Queue with something" or > "It should be also able to solve for the large input without large > cluster", let me explain again. > > Restricting the number of Task processors to the number of block files > of input, means that both below situations are problematic: > > Case 1. User want to process 1GB input with 1,000 tasks on large cluster. > Case 2. User want to process 10GB input with 3 tasks on small cluster. > > I believe this part has higher priority than other issues, such as > VertexInputReader, Spilling Queue. Hence, please don't mix everything > here, when we talking about this in the future. To re-partitioning raw > data and create partitions as desired, currently we have a > PartitioningJobRunner. So, before working on future projects, please > test with various scenarios, for example, whether it works well with > compressed files, latest Hadoop (HDFS 2.0), or on large cluster. > > Second, is a lack of active discussion on RoadMap, and a difference of > opinion on release. There's a limit as to what we can do. Moreover, as > I mentioned above, there're many high priority issues. I don't > understand why you need to develop BSP core or create separate > branches without working together on basis issues. > > Of course, research tasks are fine. If you want to work on them in > your free time, then feel free to do so. But please don't block other > developments. > > I hope you understand my meaning. > Thanks. > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
