Re: libhdfs3 development is still going on outside of ASF

Matthew Rocklin Thu, 15 Sep 2016 05:17:28 -0700

Hi All,

I joined this e-mail list in order to chime in to this discussion.  I'm not
part of Apache HAWQ but *do* use libhdfs3 and know a number of other people
who do as well.


I maintain a library for parallel programming Dask
<http://dask.pydata.org/en/latest/>, which is commonly used within the
PyData software ecosystem.  We often interact with data on HDFS and found
libhdfs3 to be an excellent solution, particularly because it doesn't
require JVM interaction, which is rare among our users.   To assist Python
users we made the wrapper library hdfs3
<http://hdfs3.readthedocs.io/en/latest/>, which has gotten some traction
both within Dask and outside.

We intentionally released and maintain hdfs3 separately from Dask because
it's a more general and releasable component.  This turns out to have been
a good move.  There are lots of people who use hdfs3 who have no interest
in using Dask at all.  They appreciate this separation because they're not
forced to grab all of Dask in order to just get the single component they
want, hdfs3.  These are great users.  They come from a wide range of
university to small and large businesses.  They contribute back to hdfs3
readily and are also, today, trying to contribute back to libhdfs3.  By not
tying hdfs3 into Dask we increased both community engagement and social
impact.

So my initial bias is "Please, keep libhdfs3 separate.  It will make my
life (and the lives of many others) much more convenient."  However I also
recognize the need for Apache's strict-for-a-reason policies.  No matter
what you all decide the PyData community will find a way to make things
work.  I just wanted to make it clear that there are several other
stakeholders out there using this library so that this decision wasn't made
in a vacuum.

Best,
-matthew rocklin




On Thu, Sep 15, 2016 at 2:38 AM, Zhanwei Wang <[email protected]> wrote:

> Hi Roman
>
> I think I have discussed enough about the benefit and drawback of merge
> two independent project together.
> Let me propose a way to see if it can make both ASF and libhdfs3’s user
> happy. And I need your advise.
>
>
> Is it possibile to have two git repository in ASF for HAWQ incubator
> project. If it is possible, I propose to solve the libhdfs3 issue like this.
>
> 1) create a new git repository in ASF and push all libhdfs3’s code and
> branch from Github to ASF.
> 2) make libhdfs3’s Github repository as read only mirror of ASF
> repository. Maybe need to transfer current owner of Github repository from
> Pivotal to ASF on Github.
> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>
>
> In this way, we keep libhdfs3 independent and keep its all pull request,
> wiki, issues and history. And most importantly libhdfs3 can follow ASF
> rules and process. People can file pull request on Github and commit to ASF
> repository and eventually mirror to Github.
>
>
> Any comments?
>
>
> Best Regards
>
> Zhanwei Wang
> [email protected]
>
>
>
> > 在 2016年9月15日，下午2:19，Zhanwei Wang <[email protected]> 写道：
> >
> >> Open source is about community first.
> >
> > Good point Kyle. I strongly agree with you!
> >
> > But unfortunately seems no one in this thread care about libhdfs3’s
> community (users) except me. Positively ignore the frustration of libhdfs3
> users and about to delete it’s repository.
> >
> >
> > So let’s set the tone of this thread.
> >
> > If we remove libhdfs3’s repository or make it read only:
> >  a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
> >  b. What drawback for BOTH HAWQ and libhdfs3’s users?
> >
> >
> >
> > The following is my answer.
> >
> > a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For
> libhdfs3’s users, none.
> >
> > b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit
> log. JIRA and pull request will be fired in HAWQ but not related to HAWQ.
> Furthermore commit in libhdfs3 may break HAWQ and it’s hard to debug, I
> have experienced it enough. It is important to use the stable version of
> libhdfs3, HAWQ code should only keep the stable version of libhdfs3.
> >
> >    For libhdfs3’s user, they have to ask question in HAWQ’s community.
> They have to clone entire HAWQ to build libhdfs3 and contribute.
> >
> > Let’s think about more. How we schedule a release of libhdfs3 when HAWQ
> is under developing? Should we branch HAWQ for libhdfs3’s release? Should
> we merge libhdfs3’s pull request when we are releasing HAWQ? Do we have to
> sync the release process of HAWQ and libhdfs3 and how?
> >
> > Maybe we should better involve libhdfs3’s users into this thread. But
> unfortunately they are not in HAWQ’s mail list. See, this is another big
> issue. We discuss dropping libhdfs3’s repository in HAWQ’s mail list
> without libhdfs3’s users involved, seems odd. Image this, one day the
> repository you are working with is gone and you even do not know this
> discuss.
> >
> > If anyone want to discuss if we should dropping libhdfs3’s repository,
> the better place is libhdfs3’s repository.
> >
> > In general merge two independent project together introduce more trouble
> than benefit.
> >
> > To be clear, I’m not against ASF rule. I’m deeply understand the
> importance of it. Is there any way to make HAWQ and libhdfs3 separated and
> make both ASF and libhdfs3’s user happy? Just like Kyle said, “HOW” is more
> important.
> >
> > @Roman, your mentoring is important.
> >
> >
> > Any comments?
> >
> >
> > Best Regards
> >
> > Zhanwei Wang
> > [email protected]
> >
> >
> >
> >> 在 2016年9月15日，下午12:54，Kyle Dunn <[email protected]> 写道：
> >>
> >> Chiming in here only as a casual but concerned observer.
> >>
> >> Open source is about community first. If the logistics around "where"
> >> libhdfs3 lives rather than the much more important issue of "how" it
> lives
> >> are the focus here, I think we've missed the real issue.
> >>
> >> For what it's worth, I concur with others, let's move it to HAWQ
> >> exclusively and move on to addressing the community, starting with the
> >> decision being made and how/where future contributions can be made.
> >>
> >> My brief scan of libhdfs3 shows numerous open pull requests (with
> >> apparently useful contributions) and several loose ends "issues". We
> need
> >> to communicate effectively to these contributors whether those PRs and
> >> issues are valuable and relevant. This type of engagement is what OSS
> >> projects live and die by. We need to be better, starting with libhdfs3,
> >> into HAWQ, and beyond.
> >>
> >> "Open source isn't someone else's job" - it's everyone's job. I'm
> >> challenging everyone with commit responsibly on repos to value community
> >> input (both code and issues) as highly as your own backlog. Pay it
> forward
> >> and maybe the community will start shrinking your backlog unexpectedly.
> >>
> >>
> >> -Kyle
> >>
> >> On Wed, Sep 14, 2016, 21:33 Lei Chang <[email protected]> wrote:
> >>
> >>>
> >>> There was a short discussion before when we moved libhfds3 to HAWQ
> repo.
> >>>
> >>> http://mail-archives.apache.org/mod_mbox/incubator-hawq-
> dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=
> [email protected]%3e
> >>> I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify
> >>> Apache build and releases in current phase. This is what we have done
> in
> >>> the past. But looks not everyone is on the same page.
> >>> CheersLei
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase" <
> [email protected]>
> >>> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Its fine if libhdfs3 is a third party license, and is treated that way.
> >>>
> >>> However, why does Apache HAWQ want to be dependent on some strange 3rd
> >>> party library with no transparency?
> >>>
> >>> We are having enough difficulties just getting our first release out.
> >>>
> >>> Is there a compelling reason why we need to keep up with the
> independently
> >>> developed libhdfs3 project?  Are they willing to make necessary
> changes so
> >>> that they are compatible with ASF's strict-for-a-good-reason policies?
> >>>
> >>> Can we fork hdfs3 for Apache HAWQ's purposes in Apache?
> >>>
> >>> If any libhdfs3 committers are also part of Apache HAWQ, perhaps you
> can
> >>> shed some light on the viability of this as an independent project
> since I
> >>> only see 4 contributors.
> >>>
> >>> -Greg
> >>>
> >>> On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu  wrote:
> >>>
> >>>> In my opinion, I think it is reasonable to transfer the third-party
> repo
> >>> of
> >>>> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ
> build,
> >>> but
> >>>> also for the consideration of ASF project. So for HAWQ project, I am
> with
> >>>> Roman.
> >>>>
> >>>> But my concern is the current users of libhdfs3 and all the pull
> >>> requests,
> >>>> wiki docs and issues. Another uncertain aspect from my perspective is
> >>> that
> >>>> although HAWQ could not run without libhdfs3, libhdfs3 could be used
> in
> >>>> other open source projects, that might be the true meaning of making
> >>>> libhdfs3 open source at the beginning.
> >>>>
> >>>> In summary, if it is really against the spirit of a ASF project for
> >>> HAWQ, a
> >>>> suggested way might be marking original libhdfs3 repo as a legacy
> repo in
> >>>> stead of remove it.
> >>>>
> >>>> Best
> >>>> Hong
> >>>>
> >>>> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
> >>>>
> >>>>> Currently libhdfs3’s official code is not the same as in HAWQ. Some
> new
> >>>>> code does not copy into HAWQ.  I do not think code change of libhdfs3
> >>>>> should follow HAWQ’s commit process because  many change are not
> >>> related
> >>>> to
> >>>>> HAWQ.
> >>>>>
> >>>>> From HAWQ side, I suggest to keep the stable version of its
> third-party
> >>>>> libraries and copy new libhdfs3’s code only when it is necessary.
> >>>>>
> >>>>> libhdfs3 was open source years before HAWQ incubating with a
> separated
> >>>>> permission of its authority. So in my opinion it is a third party and
> >>> it
> >>>>> actually was a third party before HAWQ incubating. And HAWQ is not
> the
> >>>> only
> >>>>> user.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best Regards
> >>>>>
> >>>>> Zhanwei Wang
> >>>>> [email protected]
> >>>>>
> >>>>>
> >>>>>
> >>>>>> 在 2016年9月15日，上午9:35，Roman Shaposhnik  写道：
> >>>>>>
> >>>>>> On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang
> >>>> wrote:
> >>>>>>> Hi Roman
> >>>>>>>
> >>>>>>> libhdfs3 works as third-party library of HAWQ, Just for the
> >>>> convenience
> >>>>> of HAWQ release
> >>>>>>> process we copy its code into HAWQ.  The reason is that HAWQ used
> to
> >>>>> dependent on
> >>>>>>> specific version of libhdfs3 and libhdfs3 only distribute as source
> >>>>> code and the build process is complicated.
> >>>>>>
> >>>>>> I actually don't buy this argument. libhdfs3 is not an optional
> >>>>>> dependency for HAWQ
> >>>>>> like ORCA is (for example). Without libhdfs3 there's pretty tough to
> >>>>>> imagine HAWQ.
> >>>>>> As such the code base needs to be governed as part of the ASF
> >>> project,
> >>>>>> not a random
> >>>>>> GitHub dependency.
> >>>>>>
> >>>>>> IOW, let me ask you this: were all the changes that went into
> >>> libhdfs3
> >>>>>> that is part of
> >>>>>> HAWQ discussed and reviewed via the ASF development process or did
> >>> you
> >>>>> just
> >>>>>> import them from time to time as this comment suggests:
> >>>>>>  https://issues.apache.org/jira/browse/HAWQ-1046?
> >>>>> focusedCommentId=15489669&page=com.atlassian.jira.
> >>>>> plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
> >>>>>> ?
> >>>>>>
> >>>>>>> I do not think we have any reason to shutdown a third party’s
> >>> official
> >>>>> repository.
> >>>>>>
> >>>>>> You say 3d party as though its not just you guys maintaining it on
> >>> the
> >>>>> side.
> >>>>>>
> >>>>>>> We also copy google test source code into HAWQ, just as what we did
> >>>> for
> >>>>> libhdfs3.
> >>>>>>
> >>>>>> But this is very different. You don't do any development (certainly
> >>>>>> you don't do any
> >>>>>> non-trivial development) of that code.
> >>>>>>
> >>>>>>> libhdfs3 open source under Apache license version 2 just the same
> as
> >>>>> HAWQ. So I believe there is no license issue.
> >>>>>>
> >>>>>> You're correct. There's no licensing issue but there's a pretty
> >>>>> significant
> >>>>>> governance issue.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Roman.
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >> *Kyle Dunn | Data Engineering | Pivotal*
> >> Direct: 303.905.3171 <3039053171> | Email: [email protected]
> >
>
>

Re: libhdfs3 development is still going on outside of ASF

Reply via email to