no problem, forgot what I've done there anyways. 2012/12/10 Edward J. Yoon <[email protected]>
> > Just wanted to remind you why we introduced runtime partitioning. > > Sorry that I could not review your patch of HAMA-531 and many things > of Hama 0.5 release. I was busy. > > On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut > <[email protected]> wrote: > > Just wanted to remind you why we introduced runtime partitioning. > > > > 2012/12/10 Edward J. Yoon <[email protected]> > > > >> HDFS is common. It's not tunable for only Hama BSP computing. > >> > >> > Yes, so spilling on disk is the easiest solution to save memory. Not > >> > changing the partitioning. > >> > If you want to split again through the block boundaries to distribute > the > >> > data through the cluster, then do it, but this is plainly wrong. > >> > >> Vertex load balancing is basically uses Hash partitioner. You can't > >> avoid data transfers. > >> > >> Again..., > >> > >> VertexInputReader and runtime partitioning make code complex as I > >> mentioned above. > >> > >> > This reader is needed, so people can create vertices from their own > >> fileformat. > >> > >> I don't think so. Instead of VertexInputReader, we can provide <K > >> extends WritableComparable, V extends ArrayWritable>. > >> > >> Let's assume that there's a web table in Google's BigTable (HBase). > >> User can create their own WebTableInputFormatter to read records as a > >> <Text url, TextArrayWritable anchors>. Am I wrong? > >> > >> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut > >> <[email protected]> wrote: > >> > Yes, because changing the blocksize to 32m will just use 300mb of > memory, > >> > so you can add more machines to fit the number of resulting tasks. > >> > > >> > If each node have small memory, there's no way to process in memory > >> > > >> > > >> > Yes, so spilling on disk is the easiest solution to save memory. Not > >> > changing the partitioning. > >> > If you want to split again through the block boundaries to distribute > the > >> > data through the cluster, then do it, but this is plainly wrong. > >> > > >> > 2012/12/10 Edward J. Yoon <[email protected]> > >> > > >> >> > A Hama cluster is scalable. It means that the computing capacity > >> >> >> should be increased by adding slaves. Right? > >> >> > > >> >> > > >> >> > I'm sorry, but I don't see how this relates to the vertex input > >> reader. > >> >> > >> >> Not related with input reader. It related with partitioning and load > >> >> balancing. As I reported to you before, to process vertices within > >> >> 256MB block, each TaskRunner requied 25~30GB memory. > >> >> > >> >> If each node have small memory, there's no way to process in memory > >> >> without changing block size of HDFS. > >> >> > >> >> Do you think this is scalable? > >> >> > >> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut > >> >> <[email protected]> wrote: > >> >> > Oh okay, so if you want to remove that, have a lot of fun. This > >> reader is > >> >> > needed, so people can create vertices from their own fileformat. > >> >> > Going back to a sequencefile input will not only break backward > >> >> > compatibility but also make the same issues we had before. > >> >> > > >> >> > A Hama cluster is scalable. It means that the computing capacity > >> >> >> should be increased by adding slaves. Right? > >> >> > > >> >> > > >> >> > I'm sorry, but I don't see how this relates to the vertex input > >> reader. > >> >> > > >> >> > 2012/12/10 Edward J. Yoon <[email protected]> > >> >> > > >> >> >> A Hama cluster is scalable. It means that the computing capacity > >> >> >> should be increased by adding slaves. Right? > >> >> >> > >> >> >> As I mentioned before, disk-queue and storing vertices on local > disk > >> >> >> are not urgent. > >> >> >> > >> >> >> In short, yeah, I wan to remove VertexInputReader and runtime > >> >> >> partition in Graph package. > >> >> >> > >> >> >> See also, > >> >> >> > >> >> > >> > https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756 > >> >> >> > >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut > >> >> >> <[email protected]> wrote: > >> >> >> > uhm, I have no idea what you want to archieve, do you want to > get > >> >> back to > >> >> >> > client-side partitioning? > >> >> >> > > >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]> > >> >> >> > > >> >> >> >> If there's no opinion, I'll remove VertexInputReader in > >> >> >> >> GraphJobRunner, because it make code complex. Let's consider > again > >> >> >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632 > >> >> >> >> issues. > >> >> >> >> > >> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon < > >> >> [email protected]> > >> >> >> >> wrote: > >> >> >> >> > Or, I'd like to get rid of VertexInputReader. > >> >> >> >> > > >> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon < > >> >> [email protected] > >> >> >> > > >> >> >> >> wrote: > >> >> >> >> >> In fact, there's no choice but to use runtimePartitioning > >> >> (because of > >> >> >> >> >> VertexInputReader). Right? If so, I would like to delete all > >> "if > >> >> >> >> >> (runtimePartitioning) {" conditions. > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> Best Regards, Edward J. Yoon > >> >> >> >> >> @eddieyoon > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > -- > >> >> >> >> > Best Regards, Edward J. Yoon > >> >> >> >> > @eddieyoon > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> Best Regards, Edward J. Yoon > >> >> >> >> @eddieyoon > >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Best Regards, Edward J. Yoon > >> >> >> @eddieyoon > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> Best Regards, Edward J. Yoon > >> >> @eddieyoon > >> >> > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
