Hi Wes, I'll take this to completion. Will send out a proposal tomorrow.
Thx. On Wed, Oct 10, 2018, 23:32 Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > How would you like to proceed on this? I'm tracking many projects > right now so I want to make sure someone else is "in charge" on this > effort > > Thanks, > Wes > On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > We could create a worker pool like abstraction where the workers are > the CI services, but that would require a scheduler to poll the finished > jobs then submit the dependent ones. This sounds a bit inconvenient, where > would that scheduler run: locally, on a CI or self hosted? > > > > Inevitably we're going to need to build some kind of job scheduler, > > whether it uses Airflow or Luigi or some other tool of our own > > devising. > > > > Apache Arrow is eventually going to need a host where we can manage > > such workflows. I'm looking into the possibility of a physical > > CUDA-equipped host that could be made available to Arrow developers to > > use for testing and benchmarking. I may need to run the machine out of > > my home (we did something similar for pandas -- physical machine that > > we can SSH into). > > > > All this idealism aside -- we take the shortest path possible for this > > particular packaging job, and make improvements as we can going > > forward. > > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs > > <szucs.kriszt...@gmail.com> wrote: > > > > > > I see now, so the jar would contain all of the three shared libraries. > > > > > > We could create a worker pool like abstraction where the workers are > the > > > CI services, but that would require a scheduler to poll the finished > jobs > > > then > > > submit the dependent ones. This sounds a bit inconvenient, where would > > > that scheduler run: locally, on a CI or self hosted? > > > > > > Another approach would be to use the worker the schedule the next task, > > > in a similar fashion like dask's worker_client [1] launches tasks from > > > tasks. > > > There could be synchronization problems though. This approach requires > > > to bootstrap crossbow on each CI jobs but that would: > > > - make crossbow less CI dependent (to use azure pipelines as well) > > > - unify the artifact uploading and downloading logic which is required > in > > > order > > > to support dependent tasks > > > - way less redundancy in task definitions > > > > > > What do You think? I'd prefer the second one. > > > > > > [1] > > > > https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst > > > > > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > > > It seems the complicated part of this will be having a dependent task > > > > that packages up the 3 shared libraries, one for each platform, after > > > > the individual packaging tasks are run. How would you propose > handling > > > > that? > > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs > > > > <szucs.kriszt...@gmail.com> wrote: > > > > > > > > > > Ohh, just read the thread, sorry! > > > > > > > > > > So crossbow is located here > > > > https://github.com/apache/arrow/tree/master/dev/tasks > > > > > I suggest to "fork" the python-wheels directory which contains > three > > > > templated ymls > > > > > for osx, win and linux builds. For building on linux something > like the > > > > following should > > > > > be sufficient > > > > https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19 > > > > > Then You need another entry in the tasks.yml, for example: > > > > > jar-gandiva-linux: > > > > > platform: linux > > > > > template: gandiva-jars/travis.linux.yml > > > > > params: > > > > > # arbitrary params which are available from the templated yml > > > > > ... > > > > > artifacts: > > > > > # these are the expected artifacts from the build > > > > > - gandiva-SNAPSHOT-{version}.jar > > > > > ... > > > > > > > > > > Of course crossbow is wired towards the current packaging > requirements, > > > > so likely > > > > > We need to adjust it to the newly appearing requirements. > > > > > > > > > > Feel free to reach me on gitter @kszucs. > > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > > > > > > > > hi Praveen, > > > > > > Probably the best way to accomplish this is to use our new > Crossbow > > > > > > infrastructure for task automation on Travis CI and Appveyor > rather > > > > > > than trying to do all of this within the CI entries. This is how > we > > > > > > are producing all of our binary artifacts for releases now -- > > > > > > presumably in future ASF releases, we will want to include a > > > > > > platform-independent Gandiva JAR in our release votes, so this > all > > > > > > needs to end up in Crossbow anyway. The intent is for the > Crossbow > > > > > > system to take on responsibility for all packaging automation > rather > > > > > > than using the normal CI for that. > > > > > > > > > > > > Krisztian, do you have time to help Praveen and the Gandiva crew > with > > > > > > this project? This will be an important test to document and > improve > > > > > > Crossbow for such use cases > > > > > > > > > > > > Thanks > > > > > > Wes > > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <prav...@dremio.com > > > > > > wrote: > > > > > > > > > > > > > > Hi Folks, > > > > > > > As part of https://issues.apache.org/jira/browse/ARROW-3385, > we are > > > > > > > planning to perform a snapshot release of the Gandiva Jar on > each > > > > commit to > > > > > > > master. This would be a platform independent jar that contains > the > > > > core > > > > > > > gandiva library and its jni bridge packaged for Mac, Windows > and *nix > > > > > > > platforms. > > > > > > > > > > > > > > The current plan is to deploy separate snapshot jars for each > OS > > > > through > > > > > > > entries in the Gandiva CI matrix and then have a combine step > that > > > > pulls in > > > > > > > each OS specific jar and builds a jar that has all the native > > > > libraries. > > > > > > > This build/deploy would happen only for commits on master > branch and > > > > not > > > > > > > for PR requests > > > > > > > > > > > > > > Does the plan sound ok (or) please let us know if there is a > better > > > > way to > > > > > > > achieve the same. > > > > > > > > > > > > > > If it sounds ok, can someone please help with the following > > > > > > > 1. It looks like we only do travis builds and not appveyor for > > > > master in > > > > > > > arrow. Any reason for this? > > > > > > > 2. Even if we did appveyor is there a way to sequence the > builds. > > > > Like wait > > > > > > > for appveyor to complete before kicking off travis? Since we > would > > > > need the > > > > > > > dll to be pre-built. > > > > > > > 3. Someone would need to configure the credentials to use for > the > > > > ossrh > > > > > > > deployment. The credentials would need access to deploy to > > > > org.apache.arrow. > > > > > > > > > > > > > > Thanks ahead! > > > > >