P.S., BSPJob (with table input) also the same. It's not only for GraphJob. On Mon, May 6, 2013 at 4:09 PM, Edward J. Yoon <[email protected]> wrote: > All, > > I've also roughly described details about design of Graph APIs[1]. To > reduce our misunderstandings (please read first Partitioning and > GraphModuleInternals documents), > > * In NoSQLs case, there's obviously no need to Hash-partitioning or > rewrite partition files on HDFS. So, in these input cases, I think > vertex structure should be parsed at GraphJobRunner.loadVertices() > method. > > At here, we faced two options: 1) The current implementation of > 'PartitioningRunner' writes converted vertices on sequence format > partition files. And GraphJobRunner reads only Vertex Writable > objects. If input is table, we maybe have to skip the Partitioning job > and have to parse vertex structure at loadVertices() method after > checking some conditions. 2) PartitioningRunner just writes raw > records to proper partition files after checking its partition ID. And > GraphJobRunner.loadVertices() always parses and loads vertices. > > I was mean that I prefer the latter and there's no need to write > VertexWritable files. It's not related whether graph will support only > Seq format or not. Hope my explanation is enough! > > 1. http://wiki.apache.org/hama/GraphModuleInternals > > On Mon, May 6, 2013 at 10:00 AM, Edward J. Yoon <[email protected]> wrote: >> I've described my big picture here: http://wiki.apache.org/hama/Partitioning >> >> Please review and feedback whether this is acceptable. >> >> >> On Mon, May 6, 2013 at 8:18 AM, Edward <[email protected]> wrote: >>> p.s., i think theres mis understand. it doesn't mean that graph will >>> support only sequence file format. Main is whether converting at >>> patitioning stage or loadVertices stage. >>> >>> Sent from my iPhone >>> >>> On May 6, 2013, at 8:09 AM, Suraj Menon <[email protected]> wrote: >>> >>>> Sure, Please go ahead. >>>> >>>> >>>> On Sun, May 5, 2013 at 6:52 PM, Edward J. Yoon >>>> <[email protected]>wrote: >>>> >>>>>>> Please let me know before this is changed, I would like to work on a >>>>>>> separate branch. >>>>> >>>>> I personally, we have to focus on high priority tasks. and more >>>>> feedbacks and contributions from users. So, if changes made, I'll >>>>> release periodically. If you want to work on another place, please do. >>>>> I don't want to wait your patches. >>>>> >>>>> >>>>> On Mon, May 6, 2013 at 7:49 AM, Edward J. Yoon <[email protected]> >>>>> wrote: >>>>>> For preparing integration with NoSQLs, of course, maybe condition >>>>>> check (whether converted or not) can be used without removing record >>>>>> converter. >>>>>> >>>>>> We need to discuss everything. >>>>>> >>>>>> On Mon, May 6, 2013 at 7:11 AM, Suraj Menon <[email protected]> >>>>> wrote: >>>>>>> I am still -1 if this means our graph module can work only on sequential >>>>>>> file format. >>>>>>> Please note that you can set record converter to null and make changes >>>>> to >>>>>>> loadVertices for what you desire here. >>>>>>> >>>>>>> If we came to this design, because TextInputFormat is inefficient, would >>>>>>> this work for Avro or Thrift input format? >>>>>>> Please let me know before this is changed, I would like to work on a >>>>>>> separate branch. >>>>>>> You may proceed as you wish. >>>>>>> >>>>>>> Regards, >>>>>>> Suraj >>>>>>> >>>>>>> >>>>>>> On Sun, May 5, 2013 at 4:09 PM, Edward J. Yoon <[email protected] >>>>>> wrote: >>>>>>> >>>>>>>> I think 'record converter' should be removed. It's not good idea. >>>>>>>> Moreover, it's unnecessarily complex. To keep vertex input reader, we >>>>>>>> can move related classes into common module. >>>>>>>> >>>>>>>> Let's go with my original plan. >>>>>>>> >>>>>>>> On Sat, May 4, 2013 at 9:32 AM, Edward J. Yoon <[email protected]> >>>>>>>> wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I'm reading our old discussions about record converter, superstep >>>>>>>>> injection, and common module: >>>>>>>>> >>>>>>>>> - http://markmail.org/message/ol32pp2ixfazcxfc >>>>>>>>> - http://markmail.org/message/xwtmfdrag34g5xc4 >>>>>>>>> >>>>>>>>> To clarify goals and objectives: >>>>>>>>> >>>>>>>>> 1. A parallel input partition is necessary for obtaining scalability >>>>>>>>> and elasticity of a Bulk Synchronous Parallel processing (It's not a >>>>>>>>> memory issue, or Disk/Spilling Queue, or HAMA-644. Please don't >>>>>>>>> shake). >>>>>>>>> 2. Input partitioning should be handled at BSP framework level, and >>>>> it >>>>>>>>> is for every Hama jobs, not only for Graph jobs. >>>>>>>>> 3. Unnecessary I/O Overhead need to be avoided, and NoSQLs input also >>>>>>>>> should be considered. >>>>>>>>> >>>>>>>>> The current problem is that every input of graph jobs should be >>>>>>>>> rewritten on HDFS. If you have a good idea, Please let me know. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, Edward J. Yoon >>>>>>>>> @eddieyoon >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, Edward J. Yoon >>>>>>>> @eddieyoon >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, Edward J. Yoon >>>>>> @eddieyoon >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, Edward J. Yoon >>>>> @eddieyoon >>>>> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
-- Best Regards, Edward J. Yoon @eddieyoon
