Re: Issues about Partitioning and Record converter

Edward J. Yoon Sun, 05 May 2013 18:00:17 -0700

I've described my big picture here: http://wiki.apache.org/hama/Partitioning


Please review and feedback whether this is acceptable.


On Mon, May 6, 2013 at 8:18 AM, Edward <[email protected]> wrote:
> p.s., i think theres mis understand. it doesn't mean that graph will support 
> only sequence file format. Main is whether converting at patitioning stage or 
> loadVertices stage.
>
> Sent from my iPhone
>
> On May 6, 2013, at 8:09 AM, Suraj Menon <[email protected]> wrote:
>
>> Sure, Please go ahead.
>>
>>
>> On Sun, May 5, 2013 at 6:52 PM, Edward J. Yoon <[email protected]>wrote:
>>
>>>>> Please let me know before this is changed, I would like to work on a
>>>>> separate branch.
>>>
>>> I personally, we have to focus on high priority tasks. and more
>>> feedbacks and contributions from users. So, if changes made, I'll
>>> release periodically. If you want to work on another place, please do.
>>> I don't want to wait your patches.
>>>
>>>
>>> On Mon, May 6, 2013 at 7:49 AM, Edward J. Yoon <[email protected]>
>>> wrote:
>>>> For preparing integration with NoSQLs, of course, maybe condition
>>>> check (whether converted or not) can be used without removing record
>>>> converter.
>>>>
>>>> We need to discuss everything.
>>>>
>>>> On Mon, May 6, 2013 at 7:11 AM, Suraj Menon <[email protected]>
>>> wrote:
>>>>> I am still -1 if this means our graph module can work only on sequential
>>>>> file format.
>>>>> Please note that you can set record converter to null and make changes
>>> to
>>>>> loadVertices for what you desire here.
>>>>>
>>>>> If we came to this design, because TextInputFormat is inefficient, would
>>>>> this work for Avro or Thrift input format?
>>>>> Please let me know before this is changed, I would like to work on a
>>>>> separate branch.
>>>>> You may proceed as you wish.
>>>>>
>>>>> Regards,
>>>>> Suraj
>>>>>
>>>>>
>>>>> On Sun, May 5, 2013 at 4:09 PM, Edward J. Yoon <[email protected]
>>>> wrote:
>>>>>
>>>>>> I think 'record converter' should be removed. It's not good idea.
>>>>>> Moreover, it's unnecessarily complex. To keep vertex input reader, we
>>>>>> can move related classes into common module.
>>>>>>
>>>>>> Let's go with my original plan.
>>>>>>
>>>>>> On Sat, May 4, 2013 at 9:32 AM, Edward J. Yoon <[email protected]>
>>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm reading our old discussions about record converter, superstep
>>>>>>> injection, and common module:
>>>>>>>
>>>>>>> - http://markmail.org/message/ol32pp2ixfazcxfc
>>>>>>> - http://markmail.org/message/xwtmfdrag34g5xc4
>>>>>>>
>>>>>>> To clarify goals and objectives:
>>>>>>>
>>>>>>> 1. A parallel input partition is necessary for obtaining scalability
>>>>>>> and elasticity of a Bulk Synchronous Parallel processing (It's not a
>>>>>>> memory issue, or Disk/Spilling Queue, or HAMA-644. Please don't
>>>>>>> shake).
>>>>>>> 2. Input partitioning should be handled at BSP framework level, and
>>> it
>>>>>>> is for every Hama jobs, not only for Graph jobs.
>>>>>>> 3. Unnecessary I/O Overhead need to be avoided, and NoSQLs input also
>>>>>>> should be considered.
>>>>>>>
>>>>>>> The current problem is that every input of graph jobs should be
>>>>>>> rewritten on HDFS. If you have a good idea, Please let me know.
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Issues about Partitioning and Record converter

Reply via email to