[
https://issues.apache.org/jira/browse/ARROW-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580864#comment-17580864
]
Duncan commented on ARROW-17447:
--------------------------------
Some notes from a recent first-time contributor.
* My route to contribution was _not_ "go to project website, go to
contributing page, drill down from there".
* Rather, it was:
** at work, I am working on a large codebase which uses pyarrow;
** I found a gap in functionality and believed I could fill it with a
[PR|https://github.com/apache/arrow/pull/13633];
** The natural path seemed to be to find the pyarrow project on GitHub and
find my way from there.
Rightly or wrongly, I did not know about [this
page|https://arrow.apache.org/docs/developers/contributing.html] until this
docs ticket was opened; admittedly I missed its mention at the bottom of
[CONTRIBUTING.md|https://github.com/apache/arrow/blob/master/CONTRIBUTING.md]
(which I looked for and found at the start, per open-source convention).
My expectation was that the repo's contributing directions would include
something like:
* "Clone the repo and do `docker-compose up test` to run the pyarrow test
suite."
* "Our CI system has steps foo, bar, and baz. Once these steps pass on your
PR, please contact the CODEOWNERS of the relevant part of the source tree for
review."
* "Discussion takes place on mailing list X."
* "The process from raising of PR to release is <snip>."
The actual situation that played out was as follows:
* From VSCode on an M1 Mac, I did not know where to begin to build or test
anything, or even if it would work on M1.
* I created a Jira account and ticket as directed, and started a PR. (I could
not `Start Progress` on the ticket, but this has since been fixed.)
** It might be worth mentioning in docs that use of Jira is just standard
practice in Apache projects: to me it felt somewhat "against-the-grain".
* There were some pleasant surprises at this point for which I think the Arrow
regulars should be proud of themselves:
** response to the Jira ticket was quick;
** response to the PR was quick and seemingly automatic: I don't know how
awareness of it came about but I soon had constructive comments from people
whom I assume to be maintainers;
** everyone was very patient and there was no hint of any unsporting comments;
** I do not mean to sound in any way patronising in saying all this; I was
honestly pleasantly surprised that this large project seems free from inane,
low-quality or LKML-tier abusive comments. {*}That's as it should be{*}, but
sadly first-time contributors may be bracing themselves for a harsh experience,
as I had been.
* I did not know how to build or test anything locally; I had never heard of
Cython; it's been 20+ years since I last looked at K&R, so the baptism of fire
entailed:
** make a change, push it, wait a while, see what CI says
** repeat
** try not to exhaust the patience of the maintainers who showed up.
* The code review was meticulous and I appreciated that, because having
quickly found myself in unfamiliar territory of C++ and Cython, I would not
like to leave anything to a hand-wavey "oh it's fine".
This is becoming longwinded so I will try to summarise:
"What would reduce friction?"
* Reduce CONTRIBUTING.md to no more than a link to the project website's
documentation on contributing. Reduces repetition.
* One or both of:
** "Here is a one-shot docker command to perform [a subset of] CI checks
locally"
** "Here is a list of dependencies to install on a virgin dev machine"
* Bonus: "build a pyarrow test package for your own integration testing as
follows..."
* Link to the mailing list archives so that people can quickly see the high
quality of the community
* Outline how and when to invite maintainers to perform a code review, and
what expectations to have of them (are they spare-time volunteers? paid
full-time?)
* Outline the broad strokes of the release cadence(s) for the project.
[~toddfarmer] , hope that helps :)
> [Docs] Clarify processes for first-time contributors
> ----------------------------------------------------
>
> Key: ARROW-17447
> URL: https://issues.apache.org/jira/browse/ARROW-17447
> Project: Apache Arrow
> Issue Type: Task
> Components: C++, Documentation, Go, Python, R
> Reporter: Todd Farmer
> Priority: Major
>
> Per [this
> discussion|https://lists.apache.org/thread/hycr4ghh7csspvm9jyffvqh8qo5koobg],
> improvements should be considered to reduce friction for first-time
> contributors through documentation of environment setup, CI checks, code
> review expectations, etc. Some of this exists scattered in various locations
> throughout documentation (e.g., language-specific development environments
> are documented), but it can be difficult to find. The [contributing
> page|https://arrow.apache.org/docs/developers/contributing.html] has some
> pointers at the bottom.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)