Possible "split brain" situation

2017-11-12 Thread Gimantha Bandara
Hi all, We are using embedded Spark 1.6.2 in our analytics platform[1]. For the cluster communication we use hazel-cast clustering capabilities. From Hazelcast side we set the following configurations, in order to configure the hearbeat properties. hazelcast.max.no.heartbeat.seconds=30

Re: Spark based Data Warehouse

2017-11-12 Thread Patrick Alwell
Alcon, You can most certainly do this. I’ve done benchmarking with Spark SQL and the TPCDS queries using S3 as the filesystem. Zeppelin and Livy server work well for the dash boarding and concurrent query issues: https://hortonworks.com/blog/livy-a-rest-interface-for-apache-spark/ Livy

Re: Spark based Data Warehouse

2017-11-12 Thread Vadim Semenov
It's actually quite simple to answer > 1. Is Spark SQL and UDF, able to handle all the workloads? Yes > 2. What user interface did you provide for data scientist, data engineers and analysts Home-grown platform, EMR, Zeppelin > What are the challenges in running concurrent queries, by many

Re: Spark based Data Warehouse

2017-11-12 Thread Gourav Sengupta
Dear Ashish, what you are asking for involves at least a few weeks of dedicated understanding of your used case and then it takes at least 3 to 4 months to even propose a solution. You can even build a fantastic data warehouse just using C++. The matter depends on lots of conditions. I just think

Re: Spark based Data Warehouse

2017-11-12 Thread Phillip Henry
Hi, Ashish. You are correct in saying that not *all* functionality of Spark is spill-to-disk but I am not sure how this pertains to a "concurrent user scenario". Each executor will run in its own JVM and is therefore isolated from others. That is, if the JVM of one user dies, this should not

Re: Spark based Data Warehouse

2017-11-12 Thread ashish rawat
Thanks Jorn and Phillip. My question was specifically to anyone who have tried creating a system using spark SQL, as Data Warehouse. I was trying to check, if someone has tried it and they can help with the kind of workloads which worked and the ones, which have problems. Regarding spill to disk,

Re: Spark based Data Warehouse

2017-11-12 Thread Phillip Henry
Agree with Jorn. The answer is: it depends. In the past, I've worked with data scientists who are happy to use the Spark CLI. Again, the answer is "it depends" (in this case, on the skills of your customers). Regarding sharing resources, different teams were limited to their own queue so they

Re: Spark based Data Warehouse

2017-11-12 Thread Jörn Franke
What do you mean all possible workloads? You cannot prepare any system to do all possible processing. We do not know the requirements of your data scientists now or in the future so it is difficult to say. How do they work currently without the new solution? Do they all work on the same data? I