Scaling issues due to contention in Random

2016-11-24 Thread Prasun Ratn
Hi, I am seeing perf degradation in the Spark/Pi example on a single-node setup (using local[K]) Using 1, 2, 4, and 8 cores, this is the execution time in seconds for the same number of iterations:- Random: 4.0, 7.0, 12.96, 17.96 If I change the code to use ThreadLocalRandom

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-24 Thread Reynold Xin
It's already there isn't it? The in-memory columnar cache format. On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal wrote: > Hi, > > Do we have any plan of supporting parquet-like partitioning support in > Spark SQL in-memory cache? Something like one RDD[CachedBatch] per >

Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-24 Thread Nitin Goyal
Hi, Do we have any plan of supporting parquet-like partitioning support in Spark SQL in-memory cache? Something like one RDD[CachedBatch] per in-memory cache partition. -Nitin

Re: How is the order ensured in the jdbc relation provider when inserting data from multiple executors

2016-11-24 Thread nirandap
Hi Maciej, Thanks again for the reply. Once small clarification about the answer about my #1 point. I put local[4] and shouldn't this be forcing spark to read from 4 partitions in parallel and write in parallel (by parallel I mean, the order from which partition, the data is read from a set of 4

[no subject]

2016-11-24 Thread Rostyslav Sotnychenko

Re: SparkUI via proxy

2016-11-24 Thread Georg Heiler
Sehr Port forwarding will help you out. marco rocchi schrieb am Do. 24. Nov. 2016 um 16:33: > Hi, > I'm working with Apache Spark in order to develop my master thesis.I'm new > in spark and working with cluster. I searched through internet but I didn't >

SparkUI via proxy

2016-11-24 Thread marco rocchi
Hi, I'm working with Apache Spark in order to develop my master thesis.I'm new in spark and working with cluster. I searched through internet but I didn't found a way to solve. My problem is the following one: from my pc I can access to a master node of a cluster only via proxy. To connect to

Re: Handling questions in the mailing lists

2016-11-24 Thread eliasah
Besides the traffic eventual issue, I don't believe that it would benefit users to get a standalone site. Some great answers are provided by users that aren't spark experts but maybe java, python, aws or even some system experts why do we want to play alone ? We are trying nevertheless the

RE: Handling questions in the mailing lists

2016-11-24 Thread Ioannis.Deligiannis
…my 0.1 cent ☺ As a Spark and SO user, I would not find a separate SE a good thing. *Part of the SO beauty is that you can filter easily and track different topics from one dashboard. *Being part of SO also gets good exposure as it raises awareness of Spark across a wider audience. *High

Re: Handling questions in the mailing lists

2016-11-24 Thread Sean Owen
Here's a view into the requirements, for example: http://area51.stackexchange.com/proposals/76571/emacs You're right there is a lot of activity on SO, easily 30-40 questions per day. One thing I noticed about, for example, the Data Science SE is that most questions relevant to it were still

RE: Handling questions in the mailing lists

2016-11-24 Thread assaf.mendelson
I am not sure what is enough traffic. Some of the SE groups already existing do not have that much traffic. Specifically the user mailing list has ~50 emails per day. It wouldn’t be much of a stretch to extract 1-2 questions per day from that. In the regular stackoverflow the apache-spark had

Re: Handling questions in the mailing lists

2016-11-24 Thread Sean Owen
I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place. On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson