Current status and problems of BSP core

Edward J. Yoon Sun, 12 May 2013 03:54:17 -0700

Hi dev (especially BSP core committers and PMCers),

First of all, the input re-partitioning is very important and
unavoidable part of Apache Hama. Since there are still people who say
"as if everything can be settled by Spilling Queue with something" or
"It should be also able to solve for the large input without large
cluster", let me explain again.


Restricting the number of Task processors to the number of block files
of input, means that both below situations are problematic:

Case 1. User want to process 1GB input with 1,000 tasks on large cluster.
Case 2. User want to process 10GB input with 3 tasks on small cluster.

I believe this part has higher priority than other issues, such as
VertexInputReader, Spilling Queue. Hence, please don't mix everything
here, when we talking about this in the future. To re-partitioning raw
data and create partitions as desired, currently we have a
PartitioningJobRunner. So, before working on future projects, please
test with various scenarios, for example, whether it works well with
compressed files, latest Hadoop (HDFS 2.0), or on large cluster.

Second, is a lack of active discussion on RoadMap, and a difference of
opinion on release. There's a limit as to what we can do. Moreover, as
I mentioned above, there're many high priority issues. I don't
understand why you need to develop BSP core or create separate
branches without working together on basis issues.

Of course, research tasks are fine. If you want to work on them in
your free time, then feel free to do so. But please don't block other
developments.

I hope you understand my meaning.
Thanks.

--
Best Regards, Edward J. Yoon
@eddieyoon

Current status and problems of BSP core

Reply via email to