I looped in the dev list for this conversation. A few thoughts:
On Sun, Aug 2, 2015 at 9:58 AM, Efi <[email protected]> wrote: > Thank you Steven, > > There are 4 problems I encountered with slider and made me reconsider > using it. > > 1. Slider requires from you to provide with one or more jar files to start > and run your application.VXQuery does start the execution from the cli jar, > instead it is initiated by a bash script that contains a lot of setup and > configuration prior to running the query.So one problem is changing vxquery > to run from the cli-jar too. > > First, the VXQuery cli jar is not the one to be run on the cluster. As I understand the cluster process, the cluster controller (cc) and X number of node controllers (nc) will be started by yarn. Then the VXQuery cli is run locally (or on a remote server) with the ip address of the cc. After running VXQuery cli for each of the user's queries, the cluster could be shutdown. Second, two types of parameters are specified to the bash scripts to start the jar files: java configuration and VXQuery cluster configuration details. These will need to be accounted for during set up. It may be better to store these settings in a configuration files instead of parameters to the jar file. > 2. When our users download vxquery they need to build it in order to be > able to run the queries.Which means that we dont provide them with the > necessary jars and other executable files that slider needs to run the > application.So if that continues, the user, after the build, will have to > add the files needed from slider in a zip along with the configuration that > we will provide, configure slider for his yarn setup and create the cluster > with that zip file.Otherwise we should provide the users with pre-build > vxquery packages that contain the zip file and he will just do the rest for > slider and yarn. > > The maven dependencies will need to be updated to support yarn and what ever other libraries you will need. The users will always (at this point) have to download the source and build VXQuery to run our system. Apache does not supply binary files. Please plan on this use case as being the default setting. Its ok to depend on these other libraries. > 3. Slider requires a lot of configuration that we cannot do because it > depends from the yarn setup each one has.So the user will have to figure > out the way slider works and set it up for yarn.I believe that this is not > very easy because the documentation is not good and I got lost a lot of > times before finally figuring it out.The same goes for the documentation of > Twill. > > My thoughts are that the user will already have a hadoop cluster running with yarn. I don't think we want to make them change their configuration if possible. I guess we could have certain requirements for the yarn cluster. They should be reasonable. > 4. Zookeeper is required for both twill and slider,along with yarn. I believe Hyracks (or it may be AsterixDB) is already dependent on Zookeeper so this is not new. Even if only AsterixDB is dependent on Zookeeper, I think this is ok for our system. > I would prefer implementing the yarn cluster configuration the way flink > has it because after working with flink,slider and twill I found flink the > easiest to setup,run and use. > > All that being said, I am looking for your recommendation and what you think is the best solution. The suggestion to look at Twill and Slider was to help make implementation and management easier. I am open to using flink's solution if we can have a good implementation, user friendly set up, and low maintenance (for code base and cluster management). You could even suggestion that Flicks solution be the first implementation that is later upgraded to one of these tools (or not upgraded all). Based on the above questions, I am wondering if we have the same vision for the Yarn cluster set up. Could you post a short overview of the cluster management process? The overview could later be expanded for our website documentation. > Please tell me any questions/objections you have regarding these issues. > > Best regards, > Efi > > > On 02/08/2015 07:24 μμ, Steven Jacobs wrote: > >> Hi, >> Efi has been looking at both Twill and Slider as possible ways to >> integrate YARN with VXQuery, and has been hitting several roadblocks. She >> only has two weeks left of actual coding, and was thinking of switching to >> an Apache Flink solution. I wanted to see what your thoughts are on the >> issues that she is having (she will elaborate more here) and whether it >> would be better to get something working with Flink, which should be fine >> within two weeks, or continue exploring with Slider, which might not reach >> a conclusion within two weeks. >> >> Efi-Please elaborate more on your issues here. >> >> Steven >> > >
