Re: Multi-BSP Job and Streaming

Edward J. Yoon Wed, 14 May 2014 05:22:30 -0700

Graph algorithms shows the definite difference between MapReduce and
BSP model. Disk I/O overhead isn't the problem, the number of
iteration needed is different (that's why Spark doing Pregel-clone for
graph-parallel).

Instead, the graph partitioning is needed. So, the streaming graph
analysis is VERY difficult. At a glance, this can be looked as a
integration of Storm, Kafka (or K/V store), and Giraph. But,
transferring vertices to proper processor (or getting vertices from
K/V store) is quite a tricky issue (I think this is almost
impossible). Some incremental learning algorithms also has the same
issue.

However, we can do this in an unbroken line.

On Wed, May 14, 2014 at 7:44 PM, Tommaso Teofili
<[email protected]> wrote:
> Hi Edward,
>
> it looks interesting, however I would need more information to completely
> understand what the job and data flow would be.
>
> Regards,
> Tommaso
>
>
> 2014-05-14 3:46 GMT+02:00 Edward J. Yoon <[email protected]>:
>
>> Hi,
>>
>> I've just drawn the diagram of multi-bsp job scenario. Does this make
>> sense to you?
>>
>>
>> https://docs.google.com/drawings/d/1WpBEBzRz9zXn-G8-DWDE7O2JlhxkxT2mjTloMGSJuKM/edit?usp=sharing
>>
>> The differentiation is the direct connectivity between data processing
>> and advanced analytical computing applications.
>>
>> --
>> Best Regards, Edward J. Yoon
>> CEO at DataSayer Co., Ltd.
>>

-- 
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Re: Multi-BSP Job and Streaming

Reply via email to