Re: Gandiva snapshot releases

Wes McKinney Sat, 06 Oct 2018 07:38:33 -0700

> We could create a worker pool like abstraction where the workers are the CI 
> services, but that would require a scheduler to poll the finished jobs then 
> submit the dependent ones. This sounds a bit inconvenient, where would that 
> scheduler run: locally, on a CI or self hosted?


Inevitably we're going to need to build some kind of job scheduler,
whether it uses Airflow or Luigi or some other tool of our own
devising.

Apache Arrow is eventually going to need a host where we can manage
such workflows. I'm looking into the possibility of a physical
CUDA-equipped host that could be made available to Arrow developers to
use for testing and benchmarking. I may need to run the machine out of
my home (we did something similar for pandas -- physical machine that
we can SSH into).

All this idealism aside -- we take the shortest path possible for this
particular packaging job, and make improvements as we can going
forward.
On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs
<[email protected]> wrote:
>
> I see now, so the jar would contain all of the three shared libraries.
>
> We could create a worker pool like abstraction where the workers are the
> CI services, but that would require a scheduler to poll the finished jobs
> then
> submit the dependent ones. This sounds a bit inconvenient, where would
> that scheduler run: locally, on a CI or self hosted?
>
> Another approach would be to use the worker the schedule the next task,
> in a similar fashion like dask's worker_client [1] launches tasks from
> tasks.
> There could be synchronization problems though. This approach requires
> to bootstrap crossbow on each CI jobs but that would:
> - make crossbow less CI dependent (to use azure pipelines as well)
> - unify the artifact uploading and downloading logic which is required in
> order
>   to support dependent tasks
> - way less redundancy in task definitions
>
> What do You think? I'd prefer the second one.
>
> [1]
> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst
>
> On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <[email protected]> wrote:
>
> > It seems the complicated part of this will be having a dependent task
> > that packages up the 3 shared libraries, one for each platform, after
> > the individual packaging tasks are run. How would you propose handling
> > that?
> > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs
> > <[email protected]> wrote:
> > >
> > > Ohh, just read the thread, sorry!
> > >
> > > So crossbow is located here
> > https://github.com/apache/arrow/tree/master/dev/tasks
> > > I suggest to "fork" the python-wheels directory which contains three
> > templated ymls
> > > for osx, win and linux builds. For building on linux something like the
> > following should
> > > be sufficient
> > https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19
> > > Then You need another entry in the tasks.yml, for example:
> > > jar-gandiva-linux:
> > > platform: linux
> > > template: gandiva-jars/travis.linux.yml
> > > params:
> > > # arbitrary params which are available from the templated yml
> > > ...
> > > artifacts:
> > > # these are the expected artifacts from the build
> > > - gandiva-SNAPSHOT-{version}.jar
> > > ...
> > >
> > > Of course crossbow is wired towards the current packaging requirements,
> > so likely
> > > We need to adjust it to the newly appearing requirements.
> > >
> > > Feel free to reach me on gitter @kszucs.
> > > On Oct 4 2018, at 2:02 pm, Wes McKinney <[email protected]> wrote:
> > > >
> > > > hi Praveen,
> > > > Probably the best way to accomplish this is to use our new Crossbow
> > > > infrastructure for task automation on Travis CI and Appveyor rather
> > > > than trying to do all of this within the CI entries. This is how we
> > > > are producing all of our binary artifacts for releases now --
> > > > presumably in future ASF releases, we will want to include a
> > > > platform-independent Gandiva JAR in our release votes, so this all
> > > > needs to end up in Crossbow anyway. The intent is for the Crossbow
> > > > system to take on responsibility for all packaging automation rather
> > > > than using the normal CI for that.
> > > >
> > > > Krisztian, do you have time to help Praveen and the Gandiva crew with
> > > > this project? This will be an important test to document and improve
> > > > Crossbow for such use cases
> > > >
> > > > Thanks
> > > > Wes
> > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <[email protected]>
> > wrote:
> > > > >
> > > > > Hi Folks,
> > > > > As part of https://issues.apache.org/jira/browse/ARROW-3385, we are
> > > > > planning to perform a snapshot release of the Gandiva Jar on each
> > commit to
> > > > > master. This would be a platform independent jar that contains the
> > core
> > > > > gandiva library and its jni bridge packaged for Mac, Windows and *nix
> > > > > platforms.
> > > > >
> > > > > The current plan is to deploy separate snapshot jars for each OS
> > through
> > > > > entries in the Gandiva CI matrix and then have a combine step that
> > pulls in
> > > > > each OS specific jar and builds a jar that has all the native
> > libraries.
> > > > > This build/deploy would happen only for commits on master branch and
> > not
> > > > > for PR requests
> > > > >
> > > > > Does the plan sound ok (or) please let us know if there is a better
> > way to
> > > > > achieve the same.
> > > > >
> > > > > If it sounds ok, can someone please help with the following
> > > > > 1. It looks like we only do travis builds and not appveyor for
> > master in
> > > > > arrow. Any reason for this?
> > > > > 2. Even if we did appveyor is there a way to sequence the builds.
> > Like wait
> > > > > for appveyor to complete before kicking off travis? Since we would
> > need the
> > > > > dll to be pre-built.
> > > > > 3. Someone would need to configure the credentials to use for the
> > ossrh
> > > > > deployment. The credentials would need access to deploy to
> > org.apache.arrow.
> > > > >
> > > > > Thanks ahead!
> >

Re: Gandiva snapshot releases

Reply via email to