Re: Gandiva snapshot releases

Krisztián Szűcs Thu, 11 Oct 2018 09:12:01 -0700

On Thu, Oct 11, 2018 at 12:58 PM Praveen Kumar <[email protected]> wrote:


> Hi All,
>
> I spent some time today understanding cross bow and it looks great!
>
> To unblock ourselves immediately, we are going to do the ubuntu deploy
> first, followed by the mac deploy and the fat jar deployment.
>
> To confirm our understanding we would be doing the following
>
> 1. Create a queue repo similar to one here(
> https://github.com/praveenbingo/crossbow) but under dremio org.
>
Correct, although We might want a centralized crossbow repo to deploy
scheduled (e.g. nightly) packages.

> 2. Have the repo kick off crossbow builds for each OS that we would want.
>
Correct. To run the tasks: `python crossbow.py submit gandiva-osx
gandiva-ubuntu`
It returns the build identifier, e.g. `build-123`

> 3. In addition to OS builds, there would be another build which would just
> be waiting for the OS builds to finish (with some timeout) and once done
> will package the fat jar and deploy to maven.
>
Basically yes, but depending on the build times it might worth building the
fat jar
locally instead (of course You can trigger another task which does the same
thing
just remotely). Currently the artifact downloading is built in the `sign`
command,
but we can quickly factor that out: `python crossbow.py sign build-123`

I'd like to generalize task dependencies, but this is definitely the
quickest to start with.

>
> The only thing that i am unclear of is the maven deploy tokens. Since i am
> not a committer with permissions to push to maven repo, I would need keys
> to be configured in the dremio/crossbow environment variables.
>
How often do We want to ship fat jars?

>
> Wes - do Siddharth/Jacques have permissions to push to maven repo and can i
> use the same?
>
> Also looks like the release scripts here
> <https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh>
> would need to be changed as well if we want to deploy the fat jar as part
> of releases.
>
Correct.

>
> Kristian - can you please review the proposed steps and let me know if they
> look correct to you?
>
 Absolutely!

BTW if You want to unblock yourself first, then it's enough to have a
single task which
builds the ubuntu libs and the fat jar (in a single CI build), and We can
handle the
dependent task (fat jar building) after We introduce another child (mac or
win). So We
could spare the third step in the first iteration.

>
> Thx.
>
>
> On Wed, Oct 10, 2018 at 11:33 PM Praveen Kumar <[email protected]> wrote:
>
> > Hi Wes,
> >
> > I'll take this to completion. Will send out a proposal tomorrow.
> >
> > Thx.
> >
> > On Wed, Oct 10, 2018, 23:32 Wes McKinney <[email protected]> wrote:
> >
> >> hi folks,
> >>
> >> How would you like to proceed on this? I'm tracking many projects
> >> right now so I want to make sure someone else is "in charge" on this
> >> effort
> >>
> >> Thanks,
> >> Wes
> >> On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <[email protected]>
> wrote:
> >> >
> >> > > We could create a worker pool like abstraction where the workers are
> >> the CI services, but that would require a scheduler to poll the finished
> >> jobs then submit the dependent ones. This sounds a bit inconvenient,
> where
> >> would that scheduler run: locally, on a CI or self hosted?
> >> >
> >> > Inevitably we're going to need to build some kind of job scheduler,
> >> > whether it uses Airflow or Luigi or some other tool of our own
> >> > devising.
> >> >
> >> > Apache Arrow is eventually going to need a host where we can manage
> >> > such workflows. I'm looking into the possibility of a physical
> >> > CUDA-equipped host that could be made available to Arrow developers to
> >> > use for testing and benchmarking. I may need to run the machine out of
> >> > my home (we did something similar for pandas -- physical machine that
> >> > we can SSH into).
> >> >
> >> > All this idealism aside -- we take the shortest path possible for this
> >> > particular packaging job, and make improvements as we can going
> >> > forward.
> >> > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs
> >> > <[email protected]> wrote:
> >> > >
> >> > > I see now, so the jar would contain all of the three shared
> libraries.
> >> > >
> >> > > We could create a worker pool like abstraction where the workers are
> >> the
> >> > > CI services, but that would require a scheduler to poll the finished
> >> jobs
> >> > > then
> >> > > submit the dependent ones. This sounds a bit inconvenient, where
> would
> >> > > that scheduler run: locally, on a CI or self hosted?
> >> > >
> >> > > Another approach would be to use the worker the schedule the next
> >> task,
> >> > > in a similar fashion like dask's worker_client [1] launches tasks
> from
> >> > > tasks.
> >> > > There could be synchronization problems though. This approach
> requires
> >> > > to bootstrap crossbow on each CI jobs but that would:
> >> > > - make crossbow less CI dependent (to use azure pipelines as well)
> >> > > - unify the artifact uploading and downloading logic which is
> >> required in
> >> > > order
> >> > >   to support dependent tasks
> >> > > - way less redundancy in task definitions
> >> > >
> >> > > What do You think? I'd prefer the second one.
> >> > >
> >> > > [1]
> >> > >
> >>
> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst
> >> > >
> >> > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <[email protected]>
> >> wrote:
> >> > >
> >> > > > It seems the complicated part of this will be having a dependent
> >> task
> >> > > > that packages up the 3 shared libraries, one for each platform,
> >> after
> >> > > > the individual packaging tasks are run. How would you propose
> >> handling
> >> > > > that?
> >> > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs
> >> > > > <[email protected]> wrote:
> >> > > > >
> >> > > > > Ohh, just read the thread, sorry!
> >> > > > >
> >> > > > > So crossbow is located here
> >> > > > https://github.com/apache/arrow/tree/master/dev/tasks
> >> > > > > I suggest to "fork" the python-wheels directory which contains
> >> three
> >> > > > templated ymls
> >> > > > > for osx, win and linux builds. For building on linux something
> >> like the
> >> > > > following should
> >> > > > > be sufficient
> >> > > > https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19
> >> > > > > Then You need another entry in the tasks.yml, for example:
> >> > > > > jar-gandiva-linux:
> >> > > > > platform: linux
> >> > > > > template: gandiva-jars/travis.linux.yml
> >> > > > > params:
> >> > > > > # arbitrary params which are available from the templated yml
> >> > > > > ...
> >> > > > > artifacts:
> >> > > > > # these are the expected artifacts from the build
> >> > > > > - gandiva-SNAPSHOT-{version}.jar
> >> > > > > ...
> >> > > > >
> >> > > > > Of course crossbow is wired towards the current packaging
> >> requirements,
> >> > > > so likely
> >> > > > > We need to adjust it to the newly appearing requirements.
> >> > > > >
> >> > > > > Feel free to reach me on gitter @kszucs.
> >> > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <[email protected]>
> >> wrote:
> >> > > > > >
> >> > > > > > hi Praveen,
> >> > > > > > Probably the best way to accomplish this is to use our new
> >> Crossbow
> >> > > > > > infrastructure for task automation on Travis CI and Appveyor
> >> rather
> >> > > > > > than trying to do all of this within the CI entries. This is
> >> how we
> >> > > > > > are producing all of our binary artifacts for releases now --
> >> > > > > > presumably in future ASF releases, we will want to include a
> >> > > > > > platform-independent Gandiva JAR in our release votes, so this
> >> all
> >> > > > > > needs to end up in Crossbow anyway. The intent is for the
> >> Crossbow
> >> > > > > > system to take on responsibility for all packaging automation
> >> rather
> >> > > > > > than using the normal CI for that.
> >> > > > > >
> >> > > > > > Krisztian, do you have time to help Praveen and the Gandiva
> >> crew with
> >> > > > > > this project? This will be an important test to document and
> >> improve
> >> > > > > > Crossbow for such use cases
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > > Wes
> >> > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <
> >> [email protected]>
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > Hi Folks,
> >> > > > > > > As part of https://issues.apache.org/jira/browse/ARROW-3385
> ,
> >> we are
> >> > > > > > > planning to perform a snapshot release of the Gandiva Jar on
> >> each
> >> > > > commit to
> >> > > > > > > master. This would be a platform independent jar that
> >> contains the
> >> > > > core
> >> > > > > > > gandiva library and its jni bridge packaged for Mac, Windows
> >> and *nix
> >> > > > > > > platforms.
> >> > > > > > >
> >> > > > > > > The current plan is to deploy separate snapshot jars for
> each
> >> OS
> >> > > > through
> >> > > > > > > entries in the Gandiva CI matrix and then have a combine
> step
> >> that
> >> > > > pulls in
> >> > > > > > > each OS specific jar and builds a jar that has all the
> native
> >> > > > libraries.
> >> > > > > > > This build/deploy would happen only for commits on master
> >> branch and
> >> > > > not
> >> > > > > > > for PR requests
> >> > > > > > >
> >> > > > > > > Does the plan sound ok (or) please let us know if there is a
> >> better
> >> > > > way to
> >> > > > > > > achieve the same.
> >> > > > > > >
> >> > > > > > > If it sounds ok, can someone please help with the following
> >> > > > > > > 1. It looks like we only do travis builds and not appveyor
> for
> >> > > > master in
> >> > > > > > > arrow. Any reason for this?
> >> > > > > > > 2. Even if we did appveyor is there a way to sequence the
> >> builds.
> >> > > > Like wait
> >> > > > > > > for appveyor to complete before kicking off travis? Since we
> >> would
> >> > > > need the
> >> > > > > > > dll to be pre-built.
> >> > > > > > > 3. Someone would need to configure the credentials to use
> for
> >> the
> >> > > > ossrh
> >> > > > > > > deployment. The credentials would need access to deploy to
> >> > > > org.apache.arrow.
> >> > > > > > >
> >> > > > > > > Thanks ahead!
> >> > > >
> >>
> >
>

Re: Gandiva snapshot releases

Reply via email to