Any other opinions? On Mon, Dec 10, 2012 at 9:34 PM, Edward J. Yoon <[email protected]> wrote: >> Just wanted to remind you why we introduced runtime partitioning. > > Sorry that I could not review your patch of HAMA-531 and many things > of Hama 0.5 release. I was busy. > > On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut > <[email protected]> wrote: >> Just wanted to remind you why we introduced runtime partitioning. >> >> 2012/12/10 Edward J. Yoon <[email protected]> >> >>> HDFS is common. It's not tunable for only Hama BSP computing. >>> >>> > Yes, so spilling on disk is the easiest solution to save memory. Not >>> > changing the partitioning. >>> > If you want to split again through the block boundaries to distribute the >>> > data through the cluster, then do it, but this is plainly wrong. >>> >>> Vertex load balancing is basically uses Hash partitioner. You can't >>> avoid data transfers. >>> >>> Again..., >>> >>> VertexInputReader and runtime partitioning make code complex as I >>> mentioned above. >>> >>> > This reader is needed, so people can create vertices from their own >>> fileformat. >>> >>> I don't think so. Instead of VertexInputReader, we can provide <K >>> extends WritableComparable, V extends ArrayWritable>. >>> >>> Let's assume that there's a web table in Google's BigTable (HBase). >>> User can create their own WebTableInputFormatter to read records as a >>> <Text url, TextArrayWritable anchors>. Am I wrong? >>> >>> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut >>> <[email protected]> wrote: >>> > Yes, because changing the blocksize to 32m will just use 300mb of memory, >>> > so you can add more machines to fit the number of resulting tasks. >>> > >>> > If each node have small memory, there's no way to process in memory >>> > >>> > >>> > Yes, so spilling on disk is the easiest solution to save memory. Not >>> > changing the partitioning. >>> > If you want to split again through the block boundaries to distribute the >>> > data through the cluster, then do it, but this is plainly wrong. >>> > >>> > 2012/12/10 Edward J. Yoon <[email protected]> >>> > >>> >> > A Hama cluster is scalable. It means that the computing capacity >>> >> >> should be increased by adding slaves. Right? >>> >> > >>> >> > >>> >> > I'm sorry, but I don't see how this relates to the vertex input >>> reader. >>> >> >>> >> Not related with input reader. It related with partitioning and load >>> >> balancing. As I reported to you before, to process vertices within >>> >> 256MB block, each TaskRunner requied 25~30GB memory. >>> >> >>> >> If each node have small memory, there's no way to process in memory >>> >> without changing block size of HDFS. >>> >> >>> >> Do you think this is scalable? >>> >> >>> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut >>> >> <[email protected]> wrote: >>> >> > Oh okay, so if you want to remove that, have a lot of fun. This >>> reader is >>> >> > needed, so people can create vertices from their own fileformat. >>> >> > Going back to a sequencefile input will not only break backward >>> >> > compatibility but also make the same issues we had before. >>> >> > >>> >> > A Hama cluster is scalable. It means that the computing capacity >>> >> >> should be increased by adding slaves. Right? >>> >> > >>> >> > >>> >> > I'm sorry, but I don't see how this relates to the vertex input >>> reader. >>> >> > >>> >> > 2012/12/10 Edward J. Yoon <[email protected]> >>> >> > >>> >> >> A Hama cluster is scalable. It means that the computing capacity >>> >> >> should be increased by adding slaves. Right? >>> >> >> >>> >> >> As I mentioned before, disk-queue and storing vertices on local disk >>> >> >> are not urgent. >>> >> >> >>> >> >> In short, yeah, I wan to remove VertexInputReader and runtime >>> >> >> partition in Graph package. >>> >> >> >>> >> >> See also, >>> >> >> >>> >> >>> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756 >>> >> >> >>> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut >>> >> >> <[email protected]> wrote: >>> >> >> > uhm, I have no idea what you want to archieve, do you want to get >>> >> back to >>> >> >> > client-side partitioning? >>> >> >> > >>> >> >> > 2012/12/10 Edward J. Yoon <[email protected]> >>> >> >> > >>> >> >> >> If there's no opinion, I'll remove VertexInputReader in >>> >> >> >> GraphJobRunner, because it make code complex. Let's consider again >>> >> >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632 >>> >> >> >> issues. >>> >> >> >> >>> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon < >>> >> [email protected]> >>> >> >> >> wrote: >>> >> >> >> > Or, I'd like to get rid of VertexInputReader. >>> >> >> >> > >>> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon < >>> >> [email protected] >>> >> >> > >>> >> >> >> wrote: >>> >> >> >> >> In fact, there's no choice but to use runtimePartitioning >>> >> (because of >>> >> >> >> >> VertexInputReader). Right? If so, I would like to delete all >>> "if >>> >> >> >> >> (runtimePartitioning) {" conditions. >>> >> >> >> >> >>> >> >> >> >> -- >>> >> >> >> >> Best Regards, Edward J. Yoon >>> >> >> >> >> @eddieyoon >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > -- >>> >> >> >> > Best Regards, Edward J. Yoon >>> >> >> >> > @eddieyoon >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> Best Regards, Edward J. Yoon >>> >> >> >> @eddieyoon >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Best Regards, Edward J. Yoon >>> >> >> @eddieyoon >>> >> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, Edward J. Yoon >>> >> @eddieyoon >>> >> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
-- Best Regards, Edward J. Yoon @eddieyoon
