Hi Kristian/Wes, Can you please advise on the deploy tokens. Also do you want to include the arrow jars in the snapshot deploy?
Thx. On Fri, Oct 12, 2018 at 11:50 AM Praveen Kumar <prav...@dremio.com> wrote: > Hi Kristian, > > Thanks for reviewing. > > Yup that is our plan too, we are targeting the ubuntu release first. We > will pick the mac and the combiner as required later. > > For the frequency of deployments, we would be doing at-least once a day > with the flexibility to manually trigger too. > > Thx. > > On Thu, Oct 11, 2018 at 9:41 PM Krisztián Szűcs <szucs.kriszt...@gmail.com> > wrote: > >> On Thu, Oct 11, 2018 at 12:58 PM Praveen Kumar <prav...@dremio.com> >> wrote: >> >> > Hi All, >> > >> > I spent some time today understanding cross bow and it looks great! >> > >> > To unblock ourselves immediately, we are going to do the ubuntu deploy >> > first, followed by the mac deploy and the fat jar deployment. >> > >> > To confirm our understanding we would be doing the following >> > >> > 1. Create a queue repo similar to one here( >> > https://github.com/praveenbingo/crossbow) but under dremio org. >> > >> Correct, although We might want a centralized crossbow repo to deploy >> scheduled (e.g. nightly) packages. >> >> > 2. Have the repo kick off crossbow builds for each OS that we would >> want. >> > >> Correct. To run the tasks: `python crossbow.py submit gandiva-osx >> gandiva-ubuntu` >> It returns the build identifier, e.g. `build-123` >> >> > 3. In addition to OS builds, there would be another build which would >> just >> > be waiting for the OS builds to finish (with some timeout) and once done >> > will package the fat jar and deploy to maven. >> > >> Basically yes, but depending on the build times it might worth building >> the >> fat jar >> locally instead (of course You can trigger another task which does the >> same >> thing >> just remotely). Currently the artifact downloading is built in the `sign` >> command, >> but we can quickly factor that out: `python crossbow.py sign build-123` >> >> I'd like to generalize task dependencies, but this is definitely the >> quickest to start with. >> >> > >> > The only thing that i am unclear of is the maven deploy tokens. Since i >> am >> > not a committer with permissions to push to maven repo, I would need >> keys >> > to be configured in the dremio/crossbow environment variables. >> > >> How often do We want to ship fat jars? >> >> > >> > Wes - do Siddharth/Jacques have permissions to push to maven repo and >> can i >> > use the same? >> > >> > Also looks like the release scripts here >> > <https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh> >> > would need to be changed as well if we want to deploy the fat jar as >> part >> > of releases. >> > >> Correct. >> >> > >> > Kristian - can you please review the proposed steps and let me know if >> they >> > look correct to you? >> > >> Absolutely! >> >> BTW if You want to unblock yourself first, then it's enough to have a >> single task which >> builds the ubuntu libs and the fat jar (in a single CI build), and We can >> handle the >> dependent task (fat jar building) after We introduce another child (mac or >> win). So We >> could spare the third step in the first iteration. >> >> > >> > Thx. >> > >> > >> > On Wed, Oct 10, 2018 at 11:33 PM Praveen Kumar <prav...@dremio.com> >> wrote: >> > >> > > Hi Wes, >> > > >> > > I'll take this to completion. Will send out a proposal tomorrow. >> > > >> > > Thx. >> > > >> > > On Wed, Oct 10, 2018, 23:32 Wes McKinney <wesmck...@gmail.com> wrote: >> > > >> > >> hi folks, >> > >> >> > >> How would you like to proceed on this? I'm tracking many projects >> > >> right now so I want to make sure someone else is "in charge" on this >> > >> effort >> > >> >> > >> Thanks, >> > >> Wes >> > >> On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <wesmck...@gmail.com> >> > wrote: >> > >> > >> > >> > > We could create a worker pool like abstraction where the workers >> are >> > >> the CI services, but that would require a scheduler to poll the >> finished >> > >> jobs then submit the dependent ones. This sounds a bit inconvenient, >> > where >> > >> would that scheduler run: locally, on a CI or self hosted? >> > >> > >> > >> > Inevitably we're going to need to build some kind of job scheduler, >> > >> > whether it uses Airflow or Luigi or some other tool of our own >> > >> > devising. >> > >> > >> > >> > Apache Arrow is eventually going to need a host where we can manage >> > >> > such workflows. I'm looking into the possibility of a physical >> > >> > CUDA-equipped host that could be made available to Arrow >> developers to >> > >> > use for testing and benchmarking. I may need to run the machine >> out of >> > >> > my home (we did something similar for pandas -- physical machine >> that >> > >> > we can SSH into). >> > >> > >> > >> > All this idealism aside -- we take the shortest path possible for >> this >> > >> > particular packaging job, and make improvements as we can going >> > >> > forward. >> > >> > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs >> > >> > <szucs.kriszt...@gmail.com> wrote: >> > >> > > >> > >> > > I see now, so the jar would contain all of the three shared >> > libraries. >> > >> > > >> > >> > > We could create a worker pool like abstraction where the workers >> are >> > >> the >> > >> > > CI services, but that would require a scheduler to poll the >> finished >> > >> jobs >> > >> > > then >> > >> > > submit the dependent ones. This sounds a bit inconvenient, where >> > would >> > >> > > that scheduler run: locally, on a CI or self hosted? >> > >> > > >> > >> > > Another approach would be to use the worker the schedule the next >> > >> task, >> > >> > > in a similar fashion like dask's worker_client [1] launches tasks >> > from >> > >> > > tasks. >> > >> > > There could be synchronization problems though. This approach >> > requires >> > >> > > to bootstrap crossbow on each CI jobs but that would: >> > >> > > - make crossbow less CI dependent (to use azure pipelines as >> well) >> > >> > > - unify the artifact uploading and downloading logic which is >> > >> required in >> > >> > > order >> > >> > > to support dependent tasks >> > >> > > - way less redundancy in task definitions >> > >> > > >> > >> > > What do You think? I'd prefer the second one. >> > >> > > >> > >> > > [1] >> > >> > > >> > >> >> > >> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst >> > >> > > >> > >> > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney < >> wesmck...@gmail.com> >> > >> wrote: >> > >> > > >> > >> > > > It seems the complicated part of this will be having a >> dependent >> > >> task >> > >> > > > that packages up the 3 shared libraries, one for each platform, >> > >> after >> > >> > > > the individual packaging tasks are run. How would you propose >> > >> handling >> > >> > > > that? >> > >> > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs >> > >> > > > <szucs.kriszt...@gmail.com> wrote: >> > >> > > > > >> > >> > > > > Ohh, just read the thread, sorry! >> > >> > > > > >> > >> > > > > So crossbow is located here >> > >> > > > https://github.com/apache/arrow/tree/master/dev/tasks >> > >> > > > > I suggest to "fork" the python-wheels directory which >> contains >> > >> three >> > >> > > > templated ymls >> > >> > > > > for osx, win and linux builds. For building on linux >> something >> > >> like the >> > >> > > > following should >> > >> > > > > be sufficient >> > >> > > > >> https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19 >> > >> > > > > Then You need another entry in the tasks.yml, for example: >> > >> > > > > jar-gandiva-linux: >> > >> > > > > platform: linux >> > >> > > > > template: gandiva-jars/travis.linux.yml >> > >> > > > > params: >> > >> > > > > # arbitrary params which are available from the templated yml >> > >> > > > > ... >> > >> > > > > artifacts: >> > >> > > > > # these are the expected artifacts from the build >> > >> > > > > - gandiva-SNAPSHOT-{version}.jar >> > >> > > > > ... >> > >> > > > > >> > >> > > > > Of course crossbow is wired towards the current packaging >> > >> requirements, >> > >> > > > so likely >> > >> > > > > We need to adjust it to the newly appearing requirements. >> > >> > > > > >> > >> > > > > Feel free to reach me on gitter @kszucs. >> > >> > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <wesmck...@gmail.com >> > >> > >> wrote: >> > >> > > > > > >> > >> > > > > > hi Praveen, >> > >> > > > > > Probably the best way to accomplish this is to use our new >> > >> Crossbow >> > >> > > > > > infrastructure for task automation on Travis CI and >> Appveyor >> > >> rather >> > >> > > > > > than trying to do all of this within the CI entries. This >> is >> > >> how we >> > >> > > > > > are producing all of our binary artifacts for releases now >> -- >> > >> > > > > > presumably in future ASF releases, we will want to include >> a >> > >> > > > > > platform-independent Gandiva JAR in our release votes, so >> this >> > >> all >> > >> > > > > > needs to end up in Crossbow anyway. The intent is for the >> > >> Crossbow >> > >> > > > > > system to take on responsibility for all packaging >> automation >> > >> rather >> > >> > > > > > than using the normal CI for that. >> > >> > > > > > >> > >> > > > > > Krisztian, do you have time to help Praveen and the Gandiva >> > >> crew with >> > >> > > > > > this project? This will be an important test to document >> and >> > >> improve >> > >> > > > > > Crossbow for such use cases >> > >> > > > > > >> > >> > > > > > Thanks >> > >> > > > > > Wes >> > >> > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar < >> > >> prav...@dremio.com> >> > >> > > > wrote: >> > >> > > > > > > >> > >> > > > > > > Hi Folks, >> > >> > > > > > > As part of >> https://issues.apache.org/jira/browse/ARROW-3385 >> > , >> > >> we are >> > >> > > > > > > planning to perform a snapshot release of the Gandiva >> Jar on >> > >> each >> > >> > > > commit to >> > >> > > > > > > master. This would be a platform independent jar that >> > >> contains the >> > >> > > > core >> > >> > > > > > > gandiva library and its jni bridge packaged for Mac, >> Windows >> > >> and *nix >> > >> > > > > > > platforms. >> > >> > > > > > > >> > >> > > > > > > The current plan is to deploy separate snapshot jars for >> > each >> > >> OS >> > >> > > > through >> > >> > > > > > > entries in the Gandiva CI matrix and then have a combine >> > step >> > >> that >> > >> > > > pulls in >> > >> > > > > > > each OS specific jar and builds a jar that has all the >> > native >> > >> > > > libraries. >> > >> > > > > > > This build/deploy would happen only for commits on master >> > >> branch and >> > >> > > > not >> > >> > > > > > > for PR requests >> > >> > > > > > > >> > >> > > > > > > Does the plan sound ok (or) please let us know if there >> is a >> > >> better >> > >> > > > way to >> > >> > > > > > > achieve the same. >> > >> > > > > > > >> > >> > > > > > > If it sounds ok, can someone please help with the >> following >> > >> > > > > > > 1. It looks like we only do travis builds and not >> appveyor >> > for >> > >> > > > master in >> > >> > > > > > > arrow. Any reason for this? >> > >> > > > > > > 2. Even if we did appveyor is there a way to sequence the >> > >> builds. >> > >> > > > Like wait >> > >> > > > > > > for appveyor to complete before kicking off travis? >> Since we >> > >> would >> > >> > > > need the >> > >> > > > > > > dll to be pre-built. >> > >> > > > > > > 3. Someone would need to configure the credentials to use >> > for >> > >> the >> > >> > > > ossrh >> > >> > > > > > > deployment. The credentials would need access to deploy >> to >> > >> > > > org.apache.arrow. >> > >> > > > > > > >> > >> > > > > > > Thanks ahead! >> > >> > > > >> > >> >> > > >> > >> >