Re: runtimePartitioning in GraphJobRunner

Thomas Jungblut Mon, 10 Dec 2012 13:53:23 -0800

Yes, but in patches and in Issue Hama-531, so we can review.

2012/12/10 Edward J. Yoon <[email protected]>


> We talked on gtalk, the conclusion is as below:
>
> "If there's no opinion, I'll remove VertexInputReader in
> GraphJobRunner, because it make code complex. Let's consider again
> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
> issues."
>
> I'll clean up them tomorrow.
>
> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <[email protected]>
> wrote:
> > Hi Edward, I am assuming that you want to do this because you want to run
> > the job using more BSP tasks in parallel to reduce the memory usage per
> > task and perhaps run it faster.
> > Am I right? I am +1 if this makes things faster. However this would be
> > expensive for people with smaller clusters, and we should have spill,
> cache
> > and lookup implemented for Vertices in such cases.
> >
> > Regarding backward compatibility, can we use the user's VertexInputReader
> > to read the data and then write them in sequential file format we wan't.
> I
> > was discussing this with Thomas and we felt this could be done by
> > configuring a default input reader and overriding the same by
> > configuration. We would have to make the Vertex class Writable. I would
> > like to keep it backward compatible. Is this a possibility?
> >
> > Regarding run-time partitioning, not all partitioning would be based on
> > hash partitioning. I can have a partitioner based on color of the vertex
> or
> > some other property of the vertex. It is a step we can skip if not
> > configured by user.
> >
> > Just my 2 cents. We can deprecate things but let's not remove
> immediately.
> >
> > -Suraj
> >
> > HAMA-632 can wait until everything is resolved. I am trying to reduce the
> > API complexity.
> >
> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
> > <[email protected]>wrote:
> >
> >> You didn't get the use of the reader.
> >> The reader doesn't care about the input format.
> >> It just takes the input as Writable, so for Text this is
> LongWritable/Text
> >> pairs. For NoSQL this might be LongWritable/BytesWritable.
> >>
> >> It's up to you coding this for your input sequence, not for each format.
> >> This is not hardcoded to text, only in the examples.
> >>
> >> 2012/12/10 Edward J. Yoon <[email protected]>
> >>
> >> > Again ... User can create their own InputFormatter to read records as
> >> > a <Writable, ArrayWritable> from text file or sequence file, or
> >> > NoSQLs.
> >> >
> >> > You can use K, V pairs and sequence file. Why do you want to use text
> >> > file? Should I always write text file and parse them using
> >> > VertexInputReader?
> >> >
> >> >
> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
> >> > <[email protected]> wrote:
> >> > >>
> >> > >> It's a gap in experience, Thomas.
> >> > >
> >> > >
> >> > > Most probably you should read some good books on data extraction and
> >> then
> >> > > choose your tools accordingly.
> >> > > I never think that BSP is and will be a good extraction technique
> for
> >> > > unstructured data.
> >> > >
> >> > > But these are just my two cents here- there seems to be somewhat
> more
> >> > > political problems in this game than using tools appropriately.
> >> > >
> >> > > 2012/12/10 Thomas Jungblut <[email protected]>
> >> > >
> >> > >> Yes, if you preprocess your data correctly.
> >> > >> I have done the same unstructured extraction with the movie
> database
> >> > from
> >> > >> IMDB and it worked fine.
> >> > >> That's just not a job for BSP, but for MapReduce.
> >> > >>
> >> > >> 2012/12/10 Edward J. Yoon <[email protected]>
> >> > >>
> >> > >>> It's a gap in experience, Thomas. Do you think you can extract
> >> Twitter
> >> > >>>
> >> > >>> mention graph using parseVertex?
> >> > >>>
> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
> >> > >>> <[email protected]> wrote:
> >> > >>> > I have trouble understanding you here.
> >> > >>> >
> >> > >>> > How can I generate large sample without coding?
> >> > >>> >
> >> > >>> >
> >> > >>> > Do you mean random data generation or real-life data?
> >> > >>> > Personally I think it is really convenient to transform
> >> unstructured
> >> > >>> data
> >> > >>> > in a text file to vertices.
> >> > >>> >
> >> > >>> >
> >> > >>> > 2012/12/10 Edward <[email protected]>
> >> > >>> >
> >> > >>> >> I mean, With or without input reader. How can I generate large
> >> > sample
> >> > >>> >> without coding?
> >> > >>> >>
> >> > >>> >> It's unnecessary feature. As I mentioned before, only good for
> >> > simple
> >> > >>> and
> >> > >>> >> small test.
> >> > >>> >>
> >> > >>> >> Sent from my iPhone
> >> > >>> >>
> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
> >> > >>> [email protected]>
> >> > >>> >> wrote:
> >> > >>> >>
> >> > >>> >> >>
> >> > >>> >> >> In my case, generating test data is very annoying.
> >> > >>> >> >
> >> > >>> >> >
> >> > >>> >> > Really? What is so difficult to generate tab separated text
> >> > data?;)
> >> > >>> >> > I think we shouldn't do this, but there seems to be very
> little
> >> > >>> interest
> >> > >>> >> in
> >> > >>> >> > the community so I will not block your work on it.
> >> > >>> >> >
> >> > >>> >> > Good luck ;)
> >> > >>> >>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> --
> >> > >>> Best Regards, Edward J. Yoon
> >> > >>> @eddieyoon
> >> > >>>
> >> > >>
> >> > >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >> >
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: runtimePartitioning in GraphJobRunner

Reply via email to