Re: runtimePartitioning in GraphJobRunner

Edward J. Yoon Mon, 10 Dec 2012 04:35:57 -0800

Any other opinions?

On Mon, Dec 10, 2012 at 9:34 PM, Edward J. Yoon <[email protected]> wrote:
>> Just wanted to remind you why we introduced runtime partitioning.
>
> Sorry that I could not review your patch of HAMA-531 and many things
> of Hama 0.5 release. I was busy.
>
> On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut
> <[email protected]> wrote:
>> Just wanted to remind you why we introduced runtime partitioning.
>>
>> 2012/12/10 Edward J. Yoon <[email protected]>
>>
>>> HDFS is common. It's not tunable for only Hama BSP computing.
>>>
>>> > Yes, so spilling on disk is the easiest solution to save memory. Not
>>> > changing the partitioning.
>>> > If you want to split again through the block boundaries to distribute the
>>> > data through the cluster, then do it, but this is plainly wrong.
>>>
>>> Vertex load balancing is basically uses Hash partitioner. You can't
>>> avoid data transfers.
>>>
>>> Again...,
>>>
>>> VertexInputReader and runtime partitioning make code complex as I
>>> mentioned above.
>>>
>>> > This reader is needed, so people can create vertices from their own
>>> fileformat.
>>>
>>> I don't think so. Instead of VertexInputReader, we can provide <K
>>> extends WritableComparable, V extends ArrayWritable>.
>>>
>>> Let's assume that there's a web table in Google's BigTable (HBase).
>>> User can create their own WebTableInputFormatter to read records as a
>>> <Text url, TextArrayWritable anchors>. Am I wrong?
>>>
>>> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut
>>> <[email protected]> wrote:
>>> > Yes, because changing the blocksize to 32m will just use 300mb of memory,
>>> > so you can add more machines to fit the number of resulting tasks.
>>> >
>>> > If each node have small memory, there's no way to process in memory
>>> >
>>> >
>>> > Yes, so spilling on disk is the easiest solution to save memory. Not
>>> > changing the partitioning.
>>> > If you want to split again through the block boundaries to distribute the
>>> > data through the cluster, then do it, but this is plainly wrong.
>>> >
>>> > 2012/12/10 Edward J. Yoon <[email protected]>
>>> >
>>> >> > A Hama cluster is scalable. It means that the computing capacity
>>> >> >> should be increased by adding slaves. Right?
>>> >> >
>>> >> >
>>> >> > I'm sorry, but I don't see how this relates to the vertex input
>>> reader.
>>> >>
>>> >> Not related with input reader. It related with partitioning and load
>>> >> balancing. As I reported to you before, to process vertices within
>>> >> 256MB block, each TaskRunner requied 25~30GB memory.
>>> >>
>>> >> If each node have small memory, there's no way to process in memory
>>> >> without changing block size of HDFS.
>>> >>
>>> >> Do you think this is scalable?
>>> >>
>>> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
>>> >> <[email protected]> wrote:
>>> >> > Oh okay, so if you want to remove that, have a lot of fun. This
>>> reader is
>>> >> > needed, so people can create vertices from their own fileformat.
>>> >> > Going back to a sequencefile input will not only break backward
>>> >> > compatibility but also make the same issues we had before.
>>> >> >
>>> >> > A Hama cluster is scalable. It means that the computing capacity
>>> >> >> should be increased by adding slaves. Right?
>>> >> >
>>> >> >
>>> >> > I'm sorry, but I don't see how this relates to the vertex input
>>> reader.
>>> >> >
>>> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>>> >> >
>>> >> >> A Hama cluster is scalable. It means that the computing capacity
>>> >> >> should be increased by adding slaves. Right?
>>> >> >>
>>> >> >> As I mentioned before, disk-queue and storing vertices on local disk
>>> >> >> are not urgent.
>>> >> >>
>>> >> >> In short, yeah, I wan to remove VertexInputReader and runtime
>>> >> >> partition in Graph package.
>>> >> >>
>>> >> >> See also,
>>> >> >>
>>> >>
>>> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
>>> >> >>
>>> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
>>> >> >> <[email protected]> wrote:
>>> >> >> > uhm, I have no idea what you want to archieve, do you want to get
>>> >> back to
>>> >> >> > client-side partitioning?
>>> >> >> >
>>> >> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>>> >> >> >
>>> >> >> >> If there's no opinion, I'll remove VertexInputReader in
>>> >> >> >> GraphJobRunner, because it make code complex. Let's consider again
>>> >> >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
>>> >> >> >> issues.
>>> >> >> >>
>>> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon <
>>> >> [email protected]>
>>> >> >> >> wrote:
>>> >> >> >> > Or, I'd like to get rid of VertexInputReader.
>>> >> >> >> >
>>> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon <
>>> >> [email protected]
>>> >> >> >
>>> >> >> >> wrote:
>>> >> >> >> >> In fact, there's no choice but to use runtimePartitioning
>>> >> (because of
>>> >> >> >> >> VertexInputReader). Right? If so, I would like to delete all
>>> "if
>>> >> >> >> >> (runtimePartitioning) {" conditions.
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >> Best Regards, Edward J. Yoon
>>> >> >> >> >> @eddieyoon
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Best Regards, Edward J. Yoon
>>> >> >> >> > @eddieyoon
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Best Regards, Edward J. Yoon
>>> >> >> >> @eddieyoon
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Best Regards, Edward J. Yoon
>>> >> >> @eddieyoon
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon




-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: runtimePartitioning in GraphJobRunner

Reply via email to