Re: runtimePartitioning in GraphJobRunner

Edward J. Yoon Mon, 10 Dec 2012 05:09:11 -0800

You know what? If graph is not stored well in somewhere, graph should
be extracted from unstructured data. parseVertex API is only good for
simple test/debug programs, because it's human readable text.


In my case, generating test data is very annoying.

On Mon, Dec 10, 2012 at 9:51 PM, Thomas Jungblut
<[email protected]> wrote:
> That's nothing personal, just about how we solve the problems we face.
> We need just some trade-off between API compatibility and scalability
> improvement.
>
> 2012/12/10 Edward J. Yoon <[email protected]>
>
>> I don't dislike your Intuitive input reader. Once cleaning is done, we
>> can think about it again.
>>
>> On Mon, Dec 10, 2012 at 9:37 PM, Thomas Jungblut
>> <[email protected]> wrote:
>> > no problem, forgot what I've done there anyways.
>> >
>> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >
>> >> > Just wanted to remind you why we introduced runtime partitioning.
>> >>
>> >> Sorry that I could not review your patch of HAMA-531 and many things
>> >> of Hama 0.5 release. I was busy.
>> >>
>> >> On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut
>> >> <[email protected]> wrote:
>> >> > Just wanted to remind you why we introduced runtime partitioning.
>> >> >
>> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >> >
>> >> >> HDFS is common. It's not tunable for only Hama BSP computing.
>> >> >>
>> >> >> > Yes, so spilling on disk is the easiest solution to save memory.
>> Not
>> >> >> > changing the partitioning.
>> >> >> > If you want to split again through the block boundaries to
>> distribute
>> >> the
>> >> >> > data through the cluster, then do it, but this is plainly wrong.
>> >> >>
>> >> >> Vertex load balancing is basically uses Hash partitioner. You can't
>> >> >> avoid data transfers.
>> >> >>
>> >> >> Again...,
>> >> >>
>> >> >> VertexInputReader and runtime partitioning make code complex as I
>> >> >> mentioned above.
>> >> >>
>> >> >> > This reader is needed, so people can create vertices from their own
>> >> >> fileformat.
>> >> >>
>> >> >> I don't think so. Instead of VertexInputReader, we can provide <K
>> >> >> extends WritableComparable, V extends ArrayWritable>.
>> >> >>
>> >> >> Let's assume that there's a web table in Google's BigTable (HBase).
>> >> >> User can create their own WebTableInputFormatter to read records as a
>> >> >> <Text url, TextArrayWritable anchors>. Am I wrong?
>> >> >>
>> >> >> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut
>> >> >> <[email protected]> wrote:
>> >> >> > Yes, because changing the blocksize to 32m will just use 300mb of
>> >> memory,
>> >> >> > so you can add more machines to fit the number of resulting tasks.
>> >> >> >
>> >> >> > If each node have small memory, there's no way to process in memory
>> >> >> >
>> >> >> >
>> >> >> > Yes, so spilling on disk is the easiest solution to save memory.
>> Not
>> >> >> > changing the partitioning.
>> >> >> > If you want to split again through the block boundaries to
>> distribute
>> >> the
>> >> >> > data through the cluster, then do it, but this is plainly wrong.
>> >> >> >
>> >> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >> >> >
>> >> >> >> > A Hama cluster is scalable. It means that the computing capacity
>> >> >> >> >> should be increased by adding slaves. Right?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > I'm sorry, but I don't see how this relates to the vertex input
>> >> >> reader.
>> >> >> >>
>> >> >> >> Not related with input reader. It related with partitioning and
>> load
>> >> >> >> balancing. As I reported to you before, to process vertices within
>> >> >> >> 256MB block, each TaskRunner requied 25~30GB memory.
>> >> >> >>
>> >> >> >> If each node have small memory, there's no way to process in
>> memory
>> >> >> >> without changing block size of HDFS.
>> >> >> >>
>> >> >> >> Do you think this is scalable?
>> >> >> >>
>> >> >> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
>> >> >> >> <[email protected]> wrote:
>> >> >> >> > Oh okay, so if you want to remove that, have a lot of fun. This
>> >> >> reader is
>> >> >> >> > needed, so people can create vertices from their own fileformat.
>> >> >> >> > Going back to a sequencefile input will not only break backward
>> >> >> >> > compatibility but also make the same issues we had before.
>> >> >> >> >
>> >> >> >> > A Hama cluster is scalable. It means that the computing capacity
>> >> >> >> >> should be increased by adding slaves. Right?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > I'm sorry, but I don't see how this relates to the vertex input
>> >> >> reader.
>> >> >> >> >
>> >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >> >> >> >
>> >> >> >> >> A Hama cluster is scalable. It means that the computing
>> capacity
>> >> >> >> >> should be increased by adding slaves. Right?
>> >> >> >> >>
>> >> >> >> >> As I mentioned before, disk-queue and storing vertices on local
>> >> disk
>> >> >> >> >> are not urgent.
>> >> >> >> >>
>> >> >> >> >> In short, yeah, I wan to remove VertexInputReader and runtime
>> >> >> >> >> partition in Graph package.
>> >> >> >> >>
>> >> >> >> >> See also,
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
>> >> >> >> >>
>> >> >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
>> >> >> >> >> <[email protected]> wrote:
>> >> >> >> >> > uhm, I have no idea what you want to archieve, do you want to
>> >> get
>> >> >> >> back to
>> >> >> >> >> > client-side partitioning?
>> >> >> >> >> >
>> >> >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]>
>> >> >> >> >> >
>> >> >> >> >> >> If there's no opinion, I'll remove VertexInputReader in
>> >> >> >> >> >> GraphJobRunner, because it make code complex. Let's consider
>> >> again
>> >> >> >> >> >> about the VertexInputReader, after fixing HAMA-531 and
>> HAMA-632
>> >> >> >> >> >> issues.
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon <
>> >> >> >> [email protected]>
>> >> >> >> >> >> wrote:
>> >> >> >> >> >> > Or, I'd like to get rid of VertexInputReader.
>> >> >> >> >> >> >
>> >> >> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon <
>> >> >> >> [email protected]
>> >> >> >> >> >
>> >> >> >> >> >> wrote:
>> >> >> >> >> >> >> In fact, there's no choice but to use runtimePartitioning
>> >> >> >> (because of
>> >> >> >> >> >> >> VertexInputReader). Right? If so, I would like to delete
>> all
>> >> >> "if
>> >> >> >> >> >> >> (runtimePartitioning) {" conditions.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> --
>> >> >> >> >> >> >> Best Regards, Edward J. Yoon
>> >> >> >> >> >> >> @eddieyoon
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > --
>> >> >> >> >> >> > Best Regards, Edward J. Yoon
>> >> >> >> >> >> > @eddieyoon
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> Best Regards, Edward J. Yoon
>> >> >> >> >> >> @eddieyoon
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> Best Regards, Edward J. Yoon
>> >> >> >> >> @eddieyoon
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Best Regards, Edward J. Yoon
>> >> >> >> @eddieyoon
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Best Regards, Edward J. Yoon
>> >> >> @eddieyoon
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: runtimePartitioning in GraphJobRunner

Reply via email to