Re: runtimePartitioning in GraphJobRunner

Edward J. Yoon Mon, 10 Dec 2012 14:39:14 -0800

If you can fix BSPJobClient.partition() method to partition text
input, please do.


Again ... :/

>>> * If we have VertexInputReader again, we don't need to apply it to all
>>> examples. And, random generators and examples should be managed
>>> together now.

As we discussed, I'll clean up them tomorrow.

On Tue, Dec 11, 2012 at 7:21 AM, Edward J. Yoon <[email protected]> wrote:
>> Please do me a favor a code how you want the partitioning BSP job to work
>> before removing everything. I will tell you how to use the readers without
>> any graph duplicate code so you don't need to touch the examples at all.
>
> You don't need to wait. Because it will be almost same with
> BSPJobClient.partition() method.
>
> On Tue, Dec 11, 2012 at 6:59 AM, Thomas Jungblut
> <[email protected]> wrote:
>> Please do me a favor a code how you want the partitioning BSP job to work
>> before removing everything. I will tell you how to use the readers without
>> any graph duplicate code so you don't need to touch the examples at all.
>>
>> 2012/12/10 Edward J. Yoon <[email protected]>
>>
>>> Please review
>>> https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt
>>> first.
>>>
>>> * If we have VertexInputReader again, we don't need to apply it to all
>>> examples. And, random generators and examples should be managed
>>> together now.
>>>
>>> On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut
>>> <[email protected]> wrote:
>>> > Yes, but in patches and in Issue Hama-531, so we can review.
>>> >
>>> > 2012/12/10 Edward J. Yoon <[email protected]>
>>> >
>>> >> We talked on gtalk, the conclusion is as below:
>>> >>
>>> >> "If there's no opinion, I'll remove VertexInputReader in
>>> >> GraphJobRunner, because it make code complex. Let's consider again
>>> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
>>> >> issues."
>>> >>
>>> >> I'll clean up them tomorrow.
>>> >>
>>> >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <[email protected]>
>>> >> wrote:
>>> >> > Hi Edward, I am assuming that you want to do this because you want to
>>> run
>>> >> > the job using more BSP tasks in parallel to reduce the memory usage
>>> per
>>> >> > task and perhaps run it faster.
>>> >> > Am I right? I am +1 if this makes things faster. However this would be
>>> >> > expensive for people with smaller clusters, and we should have spill,
>>> >> cache
>>> >> > and lookup implemented for Vertices in such cases.
>>> >> >
>>> >> > Regarding backward compatibility, can we use the user's
>>> VertexInputReader
>>> >> > to read the data and then write them in sequential file format we
>>> wan't.
>>> >> I
>>> >> > was discussing this with Thomas and we felt this could be done by
>>> >> > configuring a default input reader and overriding the same by
>>> >> > configuration. We would have to make the Vertex class Writable. I
>>> would
>>> >> > like to keep it backward compatible. Is this a possibility?
>>> >> >
>>> >> > Regarding run-time partitioning, not all partitioning would be based
>>> on
>>> >> > hash partitioning. I can have a partitioner based on color of the
>>> vertex
>>> >> or
>>> >> > some other property of the vertex. It is a step we can skip if not
>>> >> > configured by user.
>>> >> >
>>> >> > Just my 2 cents. We can deprecate things but let's not remove
>>> >> immediately.
>>> >> >
>>> >> > -Suraj
>>> >> >
>>> >> > HAMA-632 can wait until everything is resolved. I am trying to reduce
>>> the
>>> >> > API complexity.
>>> >> >
>>> >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
>>> >> > <[email protected]>wrote:
>>> >> >
>>> >> >> You didn't get the use of the reader.
>>> >> >> The reader doesn't care about the input format.
>>> >> >> It just takes the input as Writable, so for Text this is
>>> >> LongWritable/Text
>>> >> >> pairs. For NoSQL this might be LongWritable/BytesWritable.
>>> >> >>
>>> >> >> It's up to you coding this for your input sequence, not for each
>>> format.
>>> >> >> This is not hardcoded to text, only in the examples.
>>> >> >>
>>> >> >> 2012/12/10 Edward J. Yoon <[email protected]>
>>> >> >>
>>> >> >> > Again ... User can create their own InputFormatter to read records
>>> as
>>> >> >> > a <Writable, ArrayWritable> from text file or sequence file, or
>>> >> >> > NoSQLs.
>>> >> >> >
>>> >> >> > You can use K, V pairs and sequence file. Why do you want to use
>>> text
>>> >> >> > file? Should I always write text file and parse them using
>>> >> >> > VertexInputReader?
>>> >> >> >
>>> >> >> >
>>> >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
>>> >> >> > <[email protected]> wrote:
>>> >> >> > >>
>>> >> >> > >> It's a gap in experience, Thomas.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Most probably you should read some good books on data extraction
>>> and
>>> >> >> then
>>> >> >> > > choose your tools accordingly.
>>> >> >> > > I never think that BSP is and will be a good extraction technique
>>> >> for
>>> >> >> > > unstructured data.
>>> >> >> > >
>>> >> >> > > But these are just my two cents here- there seems to be somewhat
>>> >> more
>>> >> >> > > political problems in this game than using tools appropriately.
>>> >> >> > >
>>> >> >> > > 2012/12/10 Thomas Jungblut <[email protected]>
>>> >> >> > >
>>> >> >> > >> Yes, if you preprocess your data correctly.
>>> >> >> > >> I have done the same unstructured extraction with the movie
>>> >> database
>>> >> >> > from
>>> >> >> > >> IMDB and it worked fine.
>>> >> >> > >> That's just not a job for BSP, but for MapReduce.
>>> >> >> > >>
>>> >> >> > >> 2012/12/10 Edward J. Yoon <[email protected]>
>>> >> >> > >>
>>> >> >> > >>> It's a gap in experience, Thomas. Do you think you can extract
>>> >> >> Twitter
>>> >> >> > >>>
>>> >> >> > >>> mention graph using parseVertex?
>>> >> >> > >>>
>>> >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
>>> >> >> > >>> <[email protected]> wrote:
>>> >> >> > >>> > I have trouble understanding you here.
>>> >> >> > >>> >
>>> >> >> > >>> > How can I generate large sample without coding?
>>> >> >> > >>> >
>>> >> >> > >>> >
>>> >> >> > >>> > Do you mean random data generation or real-life data?
>>> >> >> > >>> > Personally I think it is really convenient to transform
>>> >> >> unstructured
>>> >> >> > >>> data
>>> >> >> > >>> > in a text file to vertices.
>>> >> >> > >>> >
>>> >> >> > >>> >
>>> >> >> > >>> > 2012/12/10 Edward <[email protected]>
>>> >> >> > >>> >
>>> >> >> > >>> >> I mean, With or without input reader. How can I generate
>>> large
>>> >> >> > sample
>>> >> >> > >>> >> without coding?
>>> >> >> > >>> >>
>>> >> >> > >>> >> It's unnecessary feature. As I mentioned before, only good
>>> for
>>> >> >> > simple
>>> >> >> > >>> and
>>> >> >> > >>> >> small test.
>>> >> >> > >>> >>
>>> >> >> > >>> >> Sent from my iPhone
>>> >> >> > >>> >>
>>> >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
>>> >> >> > >>> [email protected]>
>>> >> >> > >>> >> wrote:
>>> >> >> > >>> >>
>>> >> >> > >>> >> >>
>>> >> >> > >>> >> >> In my case, generating test data is very annoying.
>>> >> >> > >>> >> >
>>> >> >> > >>> >> >
>>> >> >> > >>> >> > Really? What is so difficult to generate tab separated
>>> text
>>> >> >> > data?;)
>>> >> >> > >>> >> > I think we shouldn't do this, but there seems to be very
>>> >> little
>>> >> >> > >>> interest
>>> >> >> > >>> >> in
>>> >> >> > >>> >> > the community so I will not block your work on it.
>>> >> >> > >>> >> >
>>> >> >> > >>> >> > Good luck ;)
>>> >> >> > >>> >>
>>> >> >> > >>>
>>> >> >> > >>>
>>> >> >> > >>>
>>> >> >> > >>> --
>>> >> >> > >>> Best Regards, Edward J. Yoon
>>> >> >> > >>> @eddieyoon
>>> >> >> > >>>
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Best Regards, Edward J. Yoon
>>> >> >> > @eddieyoon
>>> >> >> >
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: runtimePartitioning in GraphJobRunner

Reply via email to