Re: runtimePartitioning in GraphJobRunner

Edward J. Yoon Mon, 10 Dec 2012 13:49:19 -0800

We talked on gtalk, the conclusion is as below:

"If there's no opinion, I'll remove VertexInputReader in
GraphJobRunner, because it make code complex. Let's consider again
about the VertexInputReader, after fixing HAMA-531 and HAMA-632
issues."


I'll clean up them tomorrow.

On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <[email protected]> wrote:
> Hi Edward, I am assuming that you want to do this because you want to run
> the job using more BSP tasks in parallel to reduce the memory usage per
> task and perhaps run it faster.
> Am I right? I am +1 if this makes things faster. However this would be
> expensive for people with smaller clusters, and we should have spill, cache
> and lookup implemented for Vertices in such cases.
>
> Regarding backward compatibility, can we use the user's VertexInputReader
> to read the data and then write them in sequential file format we wan't. I
> was discussing this with Thomas and we felt this could be done by
> configuring a default input reader and overriding the same by
> configuration. We would have to make the Vertex class Writable. I would
> like to keep it backward compatible. Is this a possibility?
>
> Regarding run-time partitioning, not all partitioning would be based on
> hash partitioning. I can have a partitioner based on color of the vertex or
> some other property of the vertex. It is a step we can skip if not
> configured by user.
>
> Just my 2 cents. We can deprecate things but let's not remove immediately.
>
> -Suraj
>
> HAMA-632 can wait until everything is resolved. I am trying to reduce the
> API complexity.
>
> On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
> <[email protected]>wrote:
>
>> You didn't get the use of the reader.
>> The reader doesn't care about the input format.
>> It just takes the input as Writable, so for Text this is LongWritable/Text
>> pairs. For NoSQL this might be LongWritable/BytesWritable.
>>
>> It's up to you coding this for your input sequence, not for each format.
>> This is not hardcoded to text, only in the examples.
>>
>> 2012/12/10 Edward J. Yoon <[email protected]>
>>
>> > Again ... User can create their own InputFormatter to read records as
>> > a <Writable, ArrayWritable> from text file or sequence file, or
>> > NoSQLs.
>> >
>> > You can use K, V pairs and sequence file. Why do you want to use text
>> > file? Should I always write text file and parse them using
>> > VertexInputReader?
>> >
>> >
>> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
>> > <[email protected]> wrote:
>> > >>
>> > >> It's a gap in experience, Thomas.
>> > >
>> > >
>> > > Most probably you should read some good books on data extraction and
>> then
>> > > choose your tools accordingly.
>> > > I never think that BSP is and will be a good extraction technique for
>> > > unstructured data.
>> > >
>> > > But these are just my two cents here- there seems to be somewhat more
>> > > political problems in this game than using tools appropriately.
>> > >
>> > > 2012/12/10 Thomas Jungblut <[email protected]>
>> > >
>> > >> Yes, if you preprocess your data correctly.
>> > >> I have done the same unstructured extraction with the movie database
>> > from
>> > >> IMDB and it worked fine.
>> > >> That's just not a job for BSP, but for MapReduce.
>> > >>
>> > >> 2012/12/10 Edward J. Yoon <[email protected]>
>> > >>
>> > >>> It's a gap in experience, Thomas. Do you think you can extract
>> Twitter
>> > >>>
>> > >>> mention graph using parseVertex?
>> > >>>
>> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
>> > >>> <[email protected]> wrote:
>> > >>> > I have trouble understanding you here.
>> > >>> >
>> > >>> > How can I generate large sample without coding?
>> > >>> >
>> > >>> >
>> > >>> > Do you mean random data generation or real-life data?
>> > >>> > Personally I think it is really convenient to transform
>> unstructured
>> > >>> data
>> > >>> > in a text file to vertices.
>> > >>> >
>> > >>> >
>> > >>> > 2012/12/10 Edward <[email protected]>
>> > >>> >
>> > >>> >> I mean, With or without input reader. How can I generate large
>> > sample
>> > >>> >> without coding?
>> > >>> >>
>> > >>> >> It's unnecessary feature. As I mentioned before, only good for
>> > simple
>> > >>> and
>> > >>> >> small test.
>> > >>> >>
>> > >>> >> Sent from my iPhone
>> > >>> >>
>> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
>> > >>> [email protected]>
>> > >>> >> wrote:
>> > >>> >>
>> > >>> >> >>
>> > >>> >> >> In my case, generating test data is very annoying.
>> > >>> >> >
>> > >>> >> >
>> > >>> >> > Really? What is so difficult to generate tab separated text
>> > data?;)
>> > >>> >> > I think we shouldn't do this, but there seems to be very little
>> > >>> interest
>> > >>> >> in
>> > >>> >> > the community so I will not block your work on it.
>> > >>> >> >
>> > >>> >> > Good luck ;)
>> > >>> >>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards, Edward J. Yoon
>> > >>> @eddieyoon
>> > >>>
>> > >>
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: runtimePartitioning in GraphJobRunner

Reply via email to