The blocker is a disagreement among small PMCers. I never seen the productive discussion about input partitioning, during discuss about input partitioning. VertexInputReader, DiskVerticesInfo, and SpillingQueue were always in there. Hence, I still don't know whether you understood or not.
To be blunt, you have no opinion on plans of 0.6.1 and 0.6.2 roadmap, and you didn't voted on 0.6.1 and furthermore I felt that you want to create your own branch. Is this a tacit objection, or mis-understanding, or gesture of defiance? On Sun, May 12, 2013 at 10:47 PM, Suraj Menon <[email protected]> wrote: > We've had discussions on the same many times. > > "But please don't block other developments" - I want to understand where > the development is blocked especially for partitioning. > > -Suraj > > > On Sun, May 12, 2013 at 6:54 AM, Edward J. Yoon <[email protected]>wrote: > >> Hi dev (especially BSP core committers and PMCers), >> >> First of all, the input re-partitioning is very important and >> unavoidable part of Apache Hama. Since there are still people who say >> "as if everything can be settled by Spilling Queue with something" or >> "It should be also able to solve for the large input without large >> cluster", let me explain again. >> >> Restricting the number of Task processors to the number of block files >> of input, means that both below situations are problematic: >> >> Case 1. User want to process 1GB input with 1,000 tasks on large cluster. >> Case 2. User want to process 10GB input with 3 tasks on small cluster. >> >> I believe this part has higher priority than other issues, such as >> VertexInputReader, Spilling Queue. Hence, please don't mix everything >> here, when we talking about this in the future. To re-partitioning raw >> data and create partitions as desired, currently we have a >> PartitioningJobRunner. So, before working on future projects, please >> test with various scenarios, for example, whether it works well with >> compressed files, latest Hadoop (HDFS 2.0), or on large cluster. >> >> Second, is a lack of active discussion on RoadMap, and a difference of >> opinion on release. There's a limit as to what we can do. Moreover, as >> I mentioned above, there're many high priority issues. I don't >> understand why you need to develop BSP core or create separate >> branches without working together on basis issues. >> >> Of course, research tasks are fine. If you want to work on them in >> your free time, then feel free to do so. But please don't block other >> developments. >> >> I hope you understand my meaning. >> Thanks. >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
