You know what? If graph is not stored well in somewhere, graph should be extracted from unstructured data. parseVertex API is only good for simple test/debug programs, because it's human readable text.
In my case, generating test data is very annoying. On Mon, Dec 10, 2012 at 9:51 PM, Thomas Jungblut <[email protected]> wrote: > That's nothing personal, just about how we solve the problems we face. > We need just some trade-off between API compatibility and scalability > improvement. > > 2012/12/10 Edward J. Yoon <[email protected]> > >> I don't dislike your Intuitive input reader. Once cleaning is done, we >> can think about it again. >> >> On Mon, Dec 10, 2012 at 9:37 PM, Thomas Jungblut >> <[email protected]> wrote: >> > no problem, forgot what I've done there anyways. >> > >> > 2012/12/10 Edward J. Yoon <[email protected]> >> > >> >> > Just wanted to remind you why we introduced runtime partitioning. >> >> >> >> Sorry that I could not review your patch of HAMA-531 and many things >> >> of Hama 0.5 release. I was busy. >> >> >> >> On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut >> >> <[email protected]> wrote: >> >> > Just wanted to remind you why we introduced runtime partitioning. >> >> > >> >> > 2012/12/10 Edward J. Yoon <[email protected]> >> >> > >> >> >> HDFS is common. It's not tunable for only Hama BSP computing. >> >> >> >> >> >> > Yes, so spilling on disk is the easiest solution to save memory. >> Not >> >> >> > changing the partitioning. >> >> >> > If you want to split again through the block boundaries to >> distribute >> >> the >> >> >> > data through the cluster, then do it, but this is plainly wrong. >> >> >> >> >> >> Vertex load balancing is basically uses Hash partitioner. You can't >> >> >> avoid data transfers. >> >> >> >> >> >> Again..., >> >> >> >> >> >> VertexInputReader and runtime partitioning make code complex as I >> >> >> mentioned above. >> >> >> >> >> >> > This reader is needed, so people can create vertices from their own >> >> >> fileformat. >> >> >> >> >> >> I don't think so. Instead of VertexInputReader, we can provide <K >> >> >> extends WritableComparable, V extends ArrayWritable>. >> >> >> >> >> >> Let's assume that there's a web table in Google's BigTable (HBase). >> >> >> User can create their own WebTableInputFormatter to read records as a >> >> >> <Text url, TextArrayWritable anchors>. Am I wrong? >> >> >> >> >> >> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut >> >> >> <[email protected]> wrote: >> >> >> > Yes, because changing the blocksize to 32m will just use 300mb of >> >> memory, >> >> >> > so you can add more machines to fit the number of resulting tasks. >> >> >> > >> >> >> > If each node have small memory, there's no way to process in memory >> >> >> > >> >> >> > >> >> >> > Yes, so spilling on disk is the easiest solution to save memory. >> Not >> >> >> > changing the partitioning. >> >> >> > If you want to split again through the block boundaries to >> distribute >> >> the >> >> >> > data through the cluster, then do it, but this is plainly wrong. >> >> >> > >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]> >> >> >> > >> >> >> >> > A Hama cluster is scalable. It means that the computing capacity >> >> >> >> >> should be increased by adding slaves. Right? >> >> >> >> > >> >> >> >> > >> >> >> >> > I'm sorry, but I don't see how this relates to the vertex input >> >> >> reader. >> >> >> >> >> >> >> >> Not related with input reader. It related with partitioning and >> load >> >> >> >> balancing. As I reported to you before, to process vertices within >> >> >> >> 256MB block, each TaskRunner requied 25~30GB memory. >> >> >> >> >> >> >> >> If each node have small memory, there's no way to process in >> memory >> >> >> >> without changing block size of HDFS. >> >> >> >> >> >> >> >> Do you think this is scalable? >> >> >> >> >> >> >> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut >> >> >> >> <[email protected]> wrote: >> >> >> >> > Oh okay, so if you want to remove that, have a lot of fun. This >> >> >> reader is >> >> >> >> > needed, so people can create vertices from their own fileformat. >> >> >> >> > Going back to a sequencefile input will not only break backward >> >> >> >> > compatibility but also make the same issues we had before. >> >> >> >> > >> >> >> >> > A Hama cluster is scalable. It means that the computing capacity >> >> >> >> >> should be increased by adding slaves. Right? >> >> >> >> > >> >> >> >> > >> >> >> >> > I'm sorry, but I don't see how this relates to the vertex input >> >> >> reader. >> >> >> >> > >> >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]> >> >> >> >> > >> >> >> >> >> A Hama cluster is scalable. It means that the computing >> capacity >> >> >> >> >> should be increased by adding slaves. Right? >> >> >> >> >> >> >> >> >> >> As I mentioned before, disk-queue and storing vertices on local >> >> disk >> >> >> >> >> are not urgent. >> >> >> >> >> >> >> >> >> >> In short, yeah, I wan to remove VertexInputReader and runtime >> >> >> >> >> partition in Graph package. >> >> >> >> >> >> >> >> >> >> See also, >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756 >> >> >> >> >> >> >> >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut >> >> >> >> >> <[email protected]> wrote: >> >> >> >> >> > uhm, I have no idea what you want to archieve, do you want to >> >> get >> >> >> >> back to >> >> >> >> >> > client-side partitioning? >> >> >> >> >> > >> >> >> >> >> > 2012/12/10 Edward J. Yoon <[email protected]> >> >> >> >> >> > >> >> >> >> >> >> If there's no opinion, I'll remove VertexInputReader in >> >> >> >> >> >> GraphJobRunner, because it make code complex. Let's consider >> >> again >> >> >> >> >> >> about the VertexInputReader, after fixing HAMA-531 and >> HAMA-632 >> >> >> >> >> >> issues. >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon < >> >> >> >> [email protected]> >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Or, I'd like to get rid of VertexInputReader. >> >> >> >> >> >> > >> >> >> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon < >> >> >> >> [email protected] >> >> >> >> >> > >> >> >> >> >> >> wrote: >> >> >> >> >> >> >> In fact, there's no choice but to use runtimePartitioning >> >> >> >> (because of >> >> >> >> >> >> >> VertexInputReader). Right? If so, I would like to delete >> all >> >> >> "if >> >> >> >> >> >> >> (runtimePartitioning) {" conditions. >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> Best Regards, Edward J. Yoon >> >> >> >> >> >> >> @eddieyoon >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > -- >> >> >> >> >> >> > Best Regards, Edward J. Yoon >> >> >> >> >> >> > @eddieyoon >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> Best Regards, Edward J. Yoon >> >> >> >> >> >> @eddieyoon >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> Best Regards, Edward J. Yoon >> >> >> >> >> @eddieyoon >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Best Regards, Edward J. Yoon >> >> >> >> @eddieyoon >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Best Regards, Edward J. Yoon >> >> >> @eddieyoon >> >> >> >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> @eddieyoon >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
