Re: Issues about Partitioning and Record converter

Edward J. Yoon Mon, 06 May 2013 00:14:29 -0700

P.S., BSPJob (with table input) also the same. It's not only for GraphJob.

On Mon, May 6, 2013 at 4:09 PM, Edward J. Yoon <[email protected]> wrote:
> All,
>
> I've also roughly described details about design of Graph APIs[1]. To
> reduce our misunderstandings (please read first Partitioning and
> GraphModuleInternals documents),
>
>  * In NoSQLs case, there's obviously no need to Hash-partitioning or
> rewrite partition files on HDFS. So, in these input cases, I think
> vertex structure should be parsed at GraphJobRunner.loadVertices()
> method.
>
> At here, we faced two options: 1) The current implementation of
> 'PartitioningRunner' writes converted vertices on sequence format
> partition files. And GraphJobRunner reads only Vertex Writable
> objects. If input is table, we maybe have to skip the Partitioning job
> and have to parse vertex structure at loadVertices() method after
> checking some conditions. 2) PartitioningRunner just writes raw
> records to proper partition files after checking its partition ID. And
> GraphJobRunner.loadVertices() always parses and loads vertices.
>
> I was mean that I prefer the latter and there's no need to write
> VertexWritable files. It's not related whether graph will support only
> Seq format or not. Hope my explanation is enough!
>
> 1. http://wiki.apache.org/hama/GraphModuleInternals
>
> On Mon, May 6, 2013 at 10:00 AM, Edward J. Yoon <[email protected]> wrote:
>> I've described my big picture here: http://wiki.apache.org/hama/Partitioning
>>
>> Please review and feedback whether this is acceptable.
>>
>>
>> On Mon, May 6, 2013 at 8:18 AM, Edward <[email protected]> wrote:
>>> p.s., i think theres mis understand. it doesn't mean that graph will 
>>> support only sequence file format. Main is whether converting at 
>>> patitioning stage or loadVertices stage.
>>>
>>> Sent from my iPhone
>>>
>>> On May 6, 2013, at 8:09 AM, Suraj Menon <[email protected]> wrote:
>>>
>>>> Sure, Please go ahead.
>>>>
>>>>
>>>> On Sun, May 5, 2013 at 6:52 PM, Edward J. Yoon 
>>>> <[email protected]>wrote:
>>>>
>>>>>>> Please let me know before this is changed, I would like to work on a
>>>>>>> separate branch.
>>>>>
>>>>> I personally, we have to focus on high priority tasks. and more
>>>>> feedbacks and contributions from users. So, if changes made, I'll
>>>>> release periodically. If you want to work on another place, please do.
>>>>> I don't want to wait your patches.
>>>>>
>>>>>
>>>>> On Mon, May 6, 2013 at 7:49 AM, Edward J. Yoon <[email protected]>
>>>>> wrote:
>>>>>> For preparing integration with NoSQLs, of course, maybe condition
>>>>>> check (whether converted or not) can be used without removing record
>>>>>> converter.
>>>>>>
>>>>>> We need to discuss everything.
>>>>>>
>>>>>> On Mon, May 6, 2013 at 7:11 AM, Suraj Menon <[email protected]>
>>>>> wrote:
>>>>>>> I am still -1 if this means our graph module can work only on sequential
>>>>>>> file format.
>>>>>>> Please note that you can set record converter to null and make changes
>>>>> to
>>>>>>> loadVertices for what you desire here.
>>>>>>>
>>>>>>> If we came to this design, because TextInputFormat is inefficient, would
>>>>>>> this work for Avro or Thrift input format?
>>>>>>> Please let me know before this is changed, I would like to work on a
>>>>>>> separate branch.
>>>>>>> You may proceed as you wish.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Suraj
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 5, 2013 at 4:09 PM, Edward J. Yoon <[email protected]
>>>>>> wrote:
>>>>>>>
>>>>>>>> I think 'record converter' should be removed. It's not good idea.
>>>>>>>> Moreover, it's unnecessarily complex. To keep vertex input reader, we
>>>>>>>> can move related classes into common module.
>>>>>>>>
>>>>>>>> Let's go with my original plan.
>>>>>>>>
>>>>>>>> On Sat, May 4, 2013 at 9:32 AM, Edward J. Yoon <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm reading our old discussions about record converter, superstep
>>>>>>>>> injection, and common module:
>>>>>>>>>
>>>>>>>>> - http://markmail.org/message/ol32pp2ixfazcxfc
>>>>>>>>> - http://markmail.org/message/xwtmfdrag34g5xc4
>>>>>>>>>
>>>>>>>>> To clarify goals and objectives:
>>>>>>>>>
>>>>>>>>> 1. A parallel input partition is necessary for obtaining scalability
>>>>>>>>> and elasticity of a Bulk Synchronous Parallel processing (It's not a
>>>>>>>>> memory issue, or Disk/Spilling Queue, or HAMA-644. Please don't
>>>>>>>>> shake).
>>>>>>>>> 2. Input partitioning should be handled at BSP framework level, and
>>>>> it
>>>>>>>>> is for every Hama jobs, not only for Graph jobs.
>>>>>>>>> 3. Unnecessary I/O Overhead need to be avoided, and NoSQLs input also
>>>>>>>>> should be considered.
>>>>>>>>>
>>>>>>>>> The current problem is that every input of graph jobs should be
>>>>>>>>> rewritten on HDFS. If you have a good idea, Please let me know.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>> @eddieyoon
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>> @eddieyoon
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon




-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Issues about Partitioning and Record converter

Reply via email to