Re: runtimePartitioning in GraphJobRunner

Edward J. Yoon Mon, 10 Dec 2012 03:39:03 -0800

HDFS is common. It's not tunable for only Hama BSP computing.

> Yes, so spilling on disk is the easiest solution to save memory. Not
> changing the partitioning.
> If you want to split again through the block boundaries to distribute the
> data through the cluster, then do it, but this is plainly wrong.


Vertex load balancing is basically uses Hash partitioner. You can't
avoid data transfers.

Again...,

VertexInputReader and runtime partitioning make code complex as I
mentioned above.

> This reader is needed, so people can create vertices from their own 
> fileformat.

I don't think so. Instead of VertexInputReader, we can provide <K
extends WritableComparable, V extends ArrayWritable>.

Let's assume that there's a web table in Google's BigTable (HBase).
User can create their own WebTableInputFormatter to read records as a
<Text url, TextArrayWritable anchors>. Am I wrong?

On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut
<[email protected]> wrote:
> Yes, because changing the blocksize to 32m will just use 300mb of memory,
> so you can add more machines to fit the number of resulting tasks.
>
> If each node have small memory, there's no way to process in memory
>
>
> Yes, so spilling on disk is the easiest solution to save memory. Not
> changing the partitioning.
> If you want to split again through the block boundaries to distribute the
> data through the cluster, then do it, but this is plainly wrong.
>
> 2012/12/10 Edward J. Yoon <[email protected]>
>
>> > A Hama cluster is scalable. It means that the computing capacity
>> >> should be increased by adding slaves. Right?
>> >
>> >
>> > I'm sorry, but I don't see how this relates to the vertex input reader.
>>
>> Not related with input reader. It related with partitioning and load
>> balancing. As I reported to you before, to process vertices within
>> 256MB block, each TaskRunner requied 25~30GB memory.
>>
>> If each node have small memory, there's no way to process in memory
>> without changing block size of HDFS.
>>
>> Do you think this is scalable?
>>
>> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
>> <[email protected]> wrote:
>> > Oh okay, so if you want to remove that, have a lot of fun. This reader is
>> > needed, so people can create vertices from their own fileformat.
>> > Going back to a sequencefile input will not only break backward
>> > compatibility but also make the same issues we had before.
>> >
>> > A Hama cluster is scalable. It means that the computing capacity
>> >> should be increased by adding slaves. Right?
>> >
>> >
>> > I'm sorry, but I don't see how this relates to the vertex input reader.
>> >
>> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >
>> >> A Hama cluster is scalable. It means that the computing capacity
>> >> should be increased by adding slaves. Right?
>> >>
>> >> As I mentioned before, disk-queue and storing vertices on local disk
>> >> are not urgent.
>> >>
>> >> In short, yeah, I wan to remove VertexInputReader and runtime
>> >> partition in Graph package.
>> >>
>> >> See also,
>> >>
>> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
>> >>
>> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
>> >> <[email protected]> wrote:
>> >> > uhm, I have no idea what you want to archieve, do you want to get
>> back to
>> >> > client-side partitioning?
>> >> >
>> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >> >
>> >> >> If there's no opinion, I'll remove VertexInputReader in
>> >> >> GraphJobRunner, because it make code complex. Let's consider again
>> >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
>> >> >> issues.
>> >> >>
>> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon <
>> [email protected]>
>> >> >> wrote:
>> >> >> > Or, I'd like to get rid of VertexInputReader.
>> >> >> >
>> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon <
>> [email protected]
>> >> >
>> >> >> wrote:
>> >> >> >> In fact, there's no choice but to use runtimePartitioning
>> (because of
>> >> >> >> VertexInputReader). Right? If so, I would like to delete all "if
>> >> >> >> (runtimePartitioning) {" conditions.
>> >> >> >>
>> >> >> >> --
>> >> >> >> Best Regards, Edward J. Yoon
>> >> >> >> @eddieyoon
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Best Regards, Edward J. Yoon
>> >> >> > @eddieyoon
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Best Regards, Edward J. Yoon
>> >> >> @eddieyoon
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: runtimePartitioning in GraphJobRunner

Reply via email to