[ 
https://issues.apache.org/jira/browse/ARROW-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580864#comment-17580864
 ] 

Duncan commented on ARROW-17447:
--------------------------------

Some notes from a recent first-time contributor.
 * My route to contribution was _not_ "go to project website, go to 
contributing page, drill down from there".
 * Rather, it was:
 ** at work, I am working on a large codebase which uses pyarrow;
 ** I found a gap in functionality and believed I could fill it with a 
[PR|https://github.com/apache/arrow/pull/13633];
 ** The natural path seemed to be to find the pyarrow project on GitHub and 
find my way from there.

Rightly or wrongly, I did not know about [this 
page|https://arrow.apache.org/docs/developers/contributing.html] until this 
docs ticket was opened; admittedly I missed its mention at the bottom of 
[CONTRIBUTING.md|https://github.com/apache/arrow/blob/master/CONTRIBUTING.md] 
(which I looked for and found at the start, per open-source convention).

My expectation was that the repo's contributing directions would include 
something like:
 * "Clone the repo and do `docker-compose up test` to run the pyarrow test 
suite."
 * "Our CI system has steps foo, bar, and baz. Once these steps pass on your 
PR, please contact the CODEOWNERS of the relevant part of the source tree for 
review."
 * "Discussion takes place on mailing list X."
 * "The process from raising of PR to release is <snip>."

The actual situation that played out was as follows:
 * From VSCode on an M1 Mac, I did not know where to begin to build or test 
anything, or even if it would work on M1.
 * I created a Jira account and ticket as directed, and started a PR. (I could 
not `Start Progress` on the ticket, but this has since been fixed.)
 ** It might be worth mentioning in docs that use of Jira is just standard 
practice in Apache projects: to me it felt somewhat "against-the-grain".
 * There were some pleasant surprises at this point for which I think the Arrow 
regulars should be proud of themselves:
 ** response to the Jira ticket was quick;
 ** response to the PR was quick and seemingly automatic: I don't know how 
awareness of it came about but I soon had constructive comments from people 
whom I assume to be maintainers;
 ** everyone was very patient and there was no hint of any unsporting comments;
 ** I do not mean to sound in any way patronising in saying all this; I was 
honestly pleasantly surprised that this large project seems free from inane, 
low-quality or LKML-tier abusive comments. {*}That's as it should be{*}, but 
sadly first-time contributors may be bracing themselves for a harsh experience, 
as I had been.
 * I did not know how to build or test anything locally; I had never heard of 
Cython; it's been 20+ years since I last looked at K&R, so the baptism of fire 
entailed:
 ** make a change, push it, wait a while, see what CI says
 ** repeat
 ** try not to exhaust the patience of the maintainers who showed up.
 * The code review was meticulous and I appreciated that, because having 
quickly found myself in unfamiliar territory of C++ and Cython, I would not 
like to leave anything to a hand-wavey "oh it's fine".

 

This is becoming longwinded so I will try to summarise:

"What would reduce friction?"
 * Reduce CONTRIBUTING.md to no more than a link to the project website's 
documentation on contributing. Reduces repetition.
 * One or both of:
 ** "Here is a one-shot docker command to perform [a subset of] CI checks 
locally"
 ** "Here is a list of dependencies to install on a virgin dev machine"
 * Bonus: "build a pyarrow test package for your own integration testing as 
follows..."
 * Link to the mailing list archives so that people can quickly see the high 
quality of the community
 * Outline how and when to invite maintainers to perform a code review, and 
what expectations to have of them (are they spare-time volunteers? paid 
full-time?)
 * Outline the broad strokes of the release cadence(s) for the project.

[~toddfarmer] , hope that helps :) 

> [Docs] Clarify processes for first-time contributors
> ----------------------------------------------------
>
>                 Key: ARROW-17447
>                 URL: https://issues.apache.org/jira/browse/ARROW-17447
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: C++, Documentation, Go, Python, R
>            Reporter: Todd Farmer
>            Priority: Major
>
> Per [this 
> discussion|https://lists.apache.org/thread/hycr4ghh7csspvm9jyffvqh8qo5koobg], 
> improvements should be considered to reduce friction for first-time 
> contributors through documentation of environment setup, CI checks, code 
> review expectations, etc. Some of this exists scattered in various locations 
> throughout documentation (e.g., language-specific development environments 
> are documented), but it can be difficult to find. The [contributing 
> page|https://arrow.apache.org/docs/developers/contributing.html] has some 
> pointers at the bottom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to