Hi Chris,
I don't think I've seen a formal roadmap for either Gandiva or Flight
(others might have more context).  What you described is certainly how a
lot of work gets done.  There has been a slightly more formal roadmap
proposed for datasets, dataframe and C++ query engine but that is the
extent of what I recall seeing on the mailing list.

Regarding Gandiva and Flight off the top of my head I can think of a few
places to potentially start.  I'm not an expert in either of these but
hopefully people who are can tell me where I'm wrong :)  Also, I'm not sure
any of these are really "easy" or "beginner" tasks but if you are
interested in these two areas they would likely provide a way of ramping up
on the project.

For Gandiva:
  1. implementing a more efficient string matching algorithm (
https://issues.apache.org/jira/browse/ARROW-7278) has been raised.  If
possible it might be nice to see if there is some common code that can be
shared and benchmarked against the same kernel that exists under compute.
  2.  I believe we recently made the decision to remove gandiva from
packaged wheels with the hopes of maybe being able to create a separate
wheel at some later point in time (I don't think this is a beginner issue
per se, but worth mentioning).
  3.  I think there are still probably quite a few expressions/functions
that haven't been implemented for Gandiva but I don't know if there is an
exhaustive list.   It seems contributors from Dremio add one every now and
then.

For flight:
  1.  I'm not sure that there is a strong reference implementation provided
for flight.  I believe all of the examples checked in are closer to "toy"
code (but I haven't looked in while).  Potentially trying to construct a
more comprehensive example (perhaps something built on-top of the datasets
API might be interesting).
  2.  There were middle-ware hooks added for instrumenting flight services
a while ago.  It might be worth adding "contrib" adapters to 1 or 2 popular
frameworks that make use of the hooks.
  3.  We recently introduced a "feature" enum with the hopes it could be
used to negotiate capabilities between flight client/servers.  Looking into
implementing that negotiation could be helpful.

Another area that I'm personally interested, but haven't had time to work
on, but haven't had any time to work on are adapters from and to other
formats  (specifically Avro and protobuf).

Hope this helps and Welcome!

-Micah

On Thu, Jul 9, 2020 at 1:56 AM Chris Channing <
christopher.chann...@gmail.com> wrote:

> Antoine/Neal,
>
> Thanks for your comments, it's appreciated!
>
> My current preference would be to focus on Gandiva and/or Flight, so I'll
> start looking around there for inspiration. @Neal, regarding your comment
> around finding a feature that I'm interested in resolving, I agree with you
> and that was primarily my driver for asking if we had a roadmap either at
> the root or component level. Just to help my understanding though, how are
> the vision-level feature backlogs generated for each of these components as
> I'm assuming there must be something more than just "a user hits a
> limitation > user implements fix/feature > happy days"? Perhaps a better
> question might be, what is the short-term vs long-term vision for each of
> these components (I'm hoping this has been documented in detail somewhere
> and I've missed it)?
>
> @Antoine, thanks for the link to the revised website PR, I'll take a look
> and comment there.
>
> Cheers,
> Chris
>
> On Wed, Jul 8, 2020 at 7:43 PM Neal Richardson <
> neal.p.richard...@gmail.com>
> wrote:
>
> > Hi Chris, some additional thoughts to what Antoine said.
> >
> > Neal
> >
> > On Wed, Jul 8, 2020 at 10:56 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Hi Chris,
> > >
> > > Le 08/07/2020 à 12:01, Chris Channing a écrit :
> > > >
> > > > I've looked at the contribution guidelines, but rather than
> arbitrarily
> > > > picking a jira I was hoping that there was a more structured approach
> > for
> > > > newbies documented that I might have missed. A few questions that I
> > have
> > > > are:
> > >
> > > As a starting point, which Arrow implementation would you be interested
> > > in contributing to?  As you know, we have a bunch of them, a subset of
> > > which has its status documented here:
> > > https://github.com/apache/arrow/blob/master/docs/source/status.rst
> > >
> > > >    - Does the community have a light-weight style mentoring system to
> > > help
> > > >    contributors get up to speed?
> > >
> > > We don't.  However some developers are used to communicate on an
> > > unofficial chat instance at https://ursalabs.zulipchat.com/, where you
> > > can also ask for help (you probably want to post on the "dev" stream).
> > >
> >
> > Most new contributors tend to be users who encounter a limitation of the
> > software (or docs) and take it upon themselves to improve it. So one way
> to
> > get orientation is to start using Arrow and ask specific questions when
> you
> > run into trouble.
> >
> >
> > >
> > > >    - Are there designated component owners/guardians e.g. C++ core,
> > > Flight,
> > > >    Gandiva, API's etc that could provide guidance if a developer had
> a
> > > >    specific focus/interest?
> > >
> > > We don't have designated owners, though of course some developers are
> > > focussed on specific areas.  Best is probably to ask here, though.
> > > Also, the answers you get can benefit other people.
> > >
> >
> > > >    - Looking at the Arrow jiras in bulk, I noticed that 'easyfix',
> > > >    'beginner' and 'newbie' labels have been defined. Do you think
> that
> > it
> > > >    makes sense to pick one label and standardise on it for future
> > backlog
> > > >    grooming efforts? It would make it easier to identify the pipeline
> > of
> > > >    issues that future engineers can use to ramp up on the project.
> > >
> > > Definitely agreed.  I'm not sure how easy it is to make bulk edits on
> > > JIRA, though... perhaps someone else can chime in.
> > >
> >
> > Unfortunately, JIRA "labels" are shared with all of the Apache Software
> > Foundation, so those aren't just for Arrow. I don't observe that we use
> > them but maybe some people do, and maybe we should start.
> >
> > In general though, rather than just looking for "easy" things to do, I
> > recommend finding a JIRA issue you're personally invested in seeing
> > resolved because it affects a use case you have. I find that's a more
> > effective way to learn in general.
> >
> >
> > >
> > > By the way, one thing were fresh eyes would definitely be useful is to
> > > suggest documentation edits or improvements.
> > > We also have a small website revamp in preparation, you can see the
> > > proposed changes in the links below.  Feedback is welcome :-)
> > > https://github.com/apache/arrow-site/pull/63
> > > https://enpiar.com/arrow-site/
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

Reply via email to