We've had discussions on the same many times.

"But please don't block other developments" - I want to understand where
the development is blocked especially for partitioning.

-Suraj


On Sun, May 12, 2013 at 6:54 AM, Edward J. Yoon <[email protected]>wrote:

> Hi dev (especially BSP core committers and PMCers),
>
> First of all, the input re-partitioning is very important and
> unavoidable part of Apache Hama. Since there are still people who say
> "as if everything can be settled by Spilling Queue with something" or
> "It should be also able to solve for the large input without large
> cluster", let me explain again.
>
> Restricting the number of Task processors to the number of block files
> of input, means that both below situations are problematic:
>
> Case 1. User want to process 1GB input with 1,000 tasks on large cluster.
> Case 2. User want to process 10GB input with 3 tasks on small cluster.
>
> I believe this part has higher priority than other issues, such as
> VertexInputReader, Spilling Queue. Hence, please don't mix everything
> here, when we talking about this in the future. To re-partitioning raw
> data and create partitions as desired, currently we have a
> PartitioningJobRunner. So, before working on future projects, please
> test with various scenarios, for example, whether it works well with
> compressed files, latest Hadoop (HDFS 2.0), or on large cluster.
>
> Second, is a lack of active discussion on RoadMap, and a difference of
> opinion on release. There's a limit as to what we can do. Moreover, as
> I mentioned above, there're many high priority issues. I don't
> understand why you need to develop BSP core or create separate
> branches without working together on basis issues.
>
> Of course, research tasks are fine. If you want to work on them in
> your free time, then feel free to do so. But please don't block other
> developments.
>
> I hope you understand my meaning.
> Thanks.
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Reply via email to