Re: Error with fastgen input

Edward J. Yoon Wed, 20 Mar 2013 22:57:50 -0700

NOTE: this is my pure opinion about your thoughts.

> This is the change we talked about on the dev list and on JIRAs very
> extensively and chose a single design we want to implement. This requires a
> lot of code change, so I don't see how splitting that smaller (IMHO this is
> atomic enough) would be beneficial. And even if you split the stuff, it
> would add huge organizational overhead, because we lack of team
> members/contributors that can work on those tasks is limited.


and,

> Sorry Edward, but our releases have been a disaster so far. I'm only here
> since 0.3.0, but none of it was either scalable, nor good documented and
> well tested. I have no problem with taking more time for a product, as I
> don't feel the need to deliver half-baked stuff to people who are not using
> it anyways nor providing any feedback there (which is sad reality in many
> other open source projects as well). So in my opinion we have to iterate on
> our own and not with official releases. "It is done, when it's done" is the
> usual standard and I don't think deviating from it will give any advantages
> besides pissed off users getting Hama not to work like it should.

No one noticed that AvroRPC doesn't work well for
communication-intensive job. If you agree with this, the cause of bad
releases is lack of feedbacks from people who uses Hama actively. I
don't think this will be changed soon. So, I'll focus more on using
Hama.

Release plan of Hama 0.6.1 is just a minor release for [1][2].

> For other partitionings and with regard to our superstep API, Suraj's idea
> of injecting a preprocessing superstep that partitions the stuff into our
> messaging system is actually the best.

Before change the PartitioningJobRunner, I hope I can test, use, and
experience it on my large cluster (there are some people can't use
maven because internet security issues). Therefore, I think there is
no need to put SpillingQueue and DiskVerticesInfo into 0.6.1.

1. http://markmail.org/message/bgvojz334l76n3n7
2. http://markmail.org/thread/2yf4lkgdoreq37gn

> It is not about a skill discussion here, but I wanted to emphasize that you
> can very well work on other JIRAs instead of blocking our work on
> graph/messaging. And 23 is at least 22 more than the average of the rest of
> the team, think about that: would there be issues for newcomers? Yes there
> would! But why are you assigning them to yourself when you're not working
> actively on them?
>
> YARN is just a single umbrella issue that is "yours", there is work blocked
> on maven coding (HAMA-671) and also there is a pending patch review since
> 20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
> that things actively in your "full-time open sourcer" career.

Please take it if you want.

On Fri, Mar 15, 2013 at 2:47 AM, Thomas Jungblut
<[email protected]> wrote:
>>
>> As you know, we have a problem of lack of team members and contributors.
>
> So we should break down every tasks as small as possible.
>
>
> Where was this task not broken into pieces?
> There are at least two tasks:
>
> - Improve GraphJobRunner memory consumption (HAMA-704, even reviewed on
> reviewboard with huge memory savings)
> - Implement SpillingQueue / SortedSpillingQueue (HAMA-644, HAMA-723
> whatever else)
>
> This is the change we talked about on the dev list and on JIRAs very
> extensively and chose a single design we want to implement. This requires a
> lot of code change, so I don't see how splitting that smaller (IMHO this is
> atomic enough) would be beneficial. And even if you split the stuff, it
> would add huge organizational overhead, because we lack of team
> members/contributors that can work on those tasks is limited.
>
> I don't know what you mean exactly. But 23 issues are almost examples
>> except YARN integration tasks. If you leave here, I have to take cover
>> YARN tasks. Should I wait someone? Am I touching core module
>> aggressively?
>
>
> It is not about a skill discussion here, but I wanted to emphasize that you
> can very well work on other JIRAs instead of blocking our work on
> graph/messaging. And 23 is at least 22 more than the average of the rest of
> the team, think about that: would there be issues for newcomers? Yes there
> would! But why are you assigning them to yourself when you're not working
> actively on them?
>
> YARN is just a single umbrella issue that is "yours", there is work blocked
> on maven coding (HAMA-671) and also there is a pending patch review since
> 20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
> that things actively in your "full-time open sourcer" career.
>
> By the way, can you answer about this question - Is it really
>> technical conflicts? or emotional conflicts?
>
>
> If someone is usually emotional about things, it is you. Technically
> speaking, should we branch out such (big) refactoring issues to work on our
> own, or do you want to brew your own soup on trunk and have us merge all
> the stuff together? In case you want to please fork your own playground
> Hama and do all the stuff you want, if something emerges successfuly feel
> free to slice a patch and emit a JIRA.
>
> So I think we need to cut release as often as possible.
>
>
> Sorry Edward, but our releases have been a disaster so far. I'm only here
> since 0.3.0, but none of it was either scalable, nor good documented and
> well tested. I have no problem with taking more time for a product, as I
> don't feel the need to deliver half-baked stuff to people who are not using
> it anyways nor providing any feedback there (which is sad reality in many
> other open source projects as well). So in my opinion we have to iterate on
> our own and not with official releases. "It is done, when it's done" is the
> usual standard and I don't think deviating from it will give any advantages
> besides pissed off users getting Hama not to work like it should.
>
> Also your changes on the wiki recently:
>
> However, if no one responds to your patches for 3 days, you can commit then
>> review later.
>
>
> Who in the community has voted for that rule, or do you make the rules
> here? You can't talk about community in the same sentence as changing rules
> for everybody just because you like that.
> Where was the need to commit HAMA-745 without review? Why did you change
> that testcase? This is just the "tip" of the iceberg of changes you are
> doing to the trunk without the agreement of the community. We established a
> community process during the incubation (that was even written on the
> charter when graduating), so why do we not stick to it instead of laying
> out the rules for self-needs / or that of your employee?
>
> Regarding branches, maybe we all are not familiar with online
>> collaboration (or don't want to collaborate anymore). If we want to
>> walk own ways, why we need to be in here together?
>
>
> Branching is something that is perfectly legal when something needs to be
> developed in parallel to ongoing work. We don't have much ongoing work do
> we? So I don't think branching is usually need when working on small
> projects, because issues can be solved by communication. But if you commit
> / plan stuff to trunk without coordinating that with people (YOU KNOW) that
> are currently working on it, then it is just a bad move.
>
> In HAMA-704, I wanted to remove only message map to reduce memory
>> consumption. I still don't want to talk about disk-based vertices and
>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> 'partitioning issue fixed and quick executable examples' version ASAP.
>>
>
> You can't say B without saying A. The problems are much deeper than you
> think they are. The message consumption is not a problem of the message
> map, but a two fold problem of vertices that are in memory although they
> don't need to and a not very scalable messaging system. I told you that
> since the time we added the graph module, but I still fall on deaf ears
> with you since more than a year.
> Yea and tell you what? This requires a lot of changes.
>
> If you would have invested the time to work with us on the root of all
> issues instead of doing strange stuff e.G. like the partitioning jobs (in
> the hours I wasted to tell you about the technical downsides of it I
> could've built another Hadoop in FORTRAN) we could've gotten a release out
> months ago and work on other things.
>
> If we want to sort partitioned data using messaging system, idea
>> should be collected.
>
>
> The idea is there and the idea works, but I guess you're not following the
> JIRA's you are +1'ing to?
> Suraj is already working on the second part of the idea we divided by two
> and instead of cock fighting with each other we should work together to
> make this happening. And not as fast as possible because you want to roll
> out a release for your employee, but because we want to improve the
> framework radically and have enough time to test it throughoutly with
> various configurations and not just a Oracle BDA.
>
> P.S., These comments are never helpful in developing community.
>
>
> It is something that needs to be discussed throughout the whole project,
> and not on a single private mailing list. Community development doesn't
> start with +1'ing and smiling to everything just to keep people on board.
> Truth hurts, but is necessary to evolve something. Community starts with
> people who have a vision in making a project better, it will develop for
> itself when it is stable enough and has a bigger user base, you know-
> developers are users too. If I can't run a graph job with 1gb of wikipedia
> links on my laptop, this project is not very likely to be something I want
> to develop on. So our first responsibility is to make our project running
> perfectly smooth and nothing else. And that is something that must be
> discussed with people who want to develop, but can't- and we need these
> people.
> And to be honest again, we didn't had much other people than GSoC students
> that get a shitton of money for developing stuff and then walking away
> again? I count myself in now as well, mea culpa.



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Reply via email to