I've described my big picture here: http://wiki.apache.org/hama/Partitioning
Please review and feedback whether this is acceptable. On Mon, May 6, 2013 at 8:18 AM, Edward <[email protected]> wrote: > p.s., i think theres mis understand. it doesn't mean that graph will support > only sequence file format. Main is whether converting at patitioning stage or > loadVertices stage. > > Sent from my iPhone > > On May 6, 2013, at 8:09 AM, Suraj Menon <[email protected]> wrote: > >> Sure, Please go ahead. >> >> >> On Sun, May 5, 2013 at 6:52 PM, Edward J. Yoon <[email protected]>wrote: >> >>>>> Please let me know before this is changed, I would like to work on a >>>>> separate branch. >>> >>> I personally, we have to focus on high priority tasks. and more >>> feedbacks and contributions from users. So, if changes made, I'll >>> release periodically. If you want to work on another place, please do. >>> I don't want to wait your patches. >>> >>> >>> On Mon, May 6, 2013 at 7:49 AM, Edward J. Yoon <[email protected]> >>> wrote: >>>> For preparing integration with NoSQLs, of course, maybe condition >>>> check (whether converted or not) can be used without removing record >>>> converter. >>>> >>>> We need to discuss everything. >>>> >>>> On Mon, May 6, 2013 at 7:11 AM, Suraj Menon <[email protected]> >>> wrote: >>>>> I am still -1 if this means our graph module can work only on sequential >>>>> file format. >>>>> Please note that you can set record converter to null and make changes >>> to >>>>> loadVertices for what you desire here. >>>>> >>>>> If we came to this design, because TextInputFormat is inefficient, would >>>>> this work for Avro or Thrift input format? >>>>> Please let me know before this is changed, I would like to work on a >>>>> separate branch. >>>>> You may proceed as you wish. >>>>> >>>>> Regards, >>>>> Suraj >>>>> >>>>> >>>>> On Sun, May 5, 2013 at 4:09 PM, Edward J. Yoon <[email protected] >>>> wrote: >>>>> >>>>>> I think 'record converter' should be removed. It's not good idea. >>>>>> Moreover, it's unnecessarily complex. To keep vertex input reader, we >>>>>> can move related classes into common module. >>>>>> >>>>>> Let's go with my original plan. >>>>>> >>>>>> On Sat, May 4, 2013 at 9:32 AM, Edward J. Yoon <[email protected]> >>>>>> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I'm reading our old discussions about record converter, superstep >>>>>>> injection, and common module: >>>>>>> >>>>>>> - http://markmail.org/message/ol32pp2ixfazcxfc >>>>>>> - http://markmail.org/message/xwtmfdrag34g5xc4 >>>>>>> >>>>>>> To clarify goals and objectives: >>>>>>> >>>>>>> 1. A parallel input partition is necessary for obtaining scalability >>>>>>> and elasticity of a Bulk Synchronous Parallel processing (It's not a >>>>>>> memory issue, or Disk/Spilling Queue, or HAMA-644. Please don't >>>>>>> shake). >>>>>>> 2. Input partitioning should be handled at BSP framework level, and >>> it >>>>>>> is for every Hama jobs, not only for Graph jobs. >>>>>>> 3. Unnecessary I/O Overhead need to be avoided, and NoSQLs input also >>>>>>> should be considered. >>>>>>> >>>>>>> The current problem is that every input of graph jobs should be >>>>>>> rewritten on HDFS. If you have a good idea, Please let me know. >>>>>>> >>>>>>> -- >>>>>>> Best Regards, Edward J. Yoon >>>>>>> @eddieyoon >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, Edward J. Yoon >>>>>> @eddieyoon >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> -- Best Regards, Edward J. Yoon @eddieyoon
