Re: Random Forest hangs without trace of error

2016-12-10 Thread Morten Hornbech
I haven’t actually experienced any non-determinism. We have nightly integration tests comparing output from random forests with no variations. The workaround we will probably try is to split the dataset, either randomly or on one of the variables, and then train a forest on each partition,

Re: Random Forest hangs without trace of error

2016-12-10 Thread Marco Mistroni
Hello Morten ok. afaik there is a tiny bit of randomness in these ML algorithms (pls anyone correct me if i m wrong). In fact if you run your RDF code multiple times, it will not give you EXACTLY the same results (though accuracy and errors should me more or less similar)..at least this is what i

Re: Running spark from Eclipse and then Jar

2016-12-10 Thread Iman Mohtashemi
Oh thanks! I'll take a look On Sat, Dec 10, 2016 at 11:37 AM Md. Rezaul Karim < rezaul.ka...@insight-centre.org> wrote: > Hello Iman, > > Finally, I managed to solve the problem. I had been experiencing the > problem because of the locking issue in the "*metastore_db*" under the > project tree

Re: Running spark from Eclipse and then Jar

2016-12-10 Thread Md. Rezaul Karim
Hello Iman, Finally, I managed to solve the problem. I had been experiencing the problem because of the locking issue in the "*metastore_db*" under the project tree on Eclipse. If you see the project tree, under the "*metastore_db*" folder you should see a file named "*db.lck*" file which was

Re: Random Forest hangs without trace of error

2016-12-10 Thread Marco Mistroni
Hi Bring back samples to 1k range to debugor as suggested reduce tree and bins had rdd running on same size data with no issues.or send me some sample code and data and I try it out on my ec2 instance ... Kr On 10 Dec 2016 3:16 am, "Md. Rezaul Karim"

Re: Design patterns for Spark implementation

2016-12-10 Thread Mich Talebzadeh
Hi Sachin, The idea of using Spark on RDBMS to do complex queries is interesting and will mature as SQL on Spark gets closer to ANSI. There are a number of challenges here: 1. The application owners prefer to stay on RDBMS 2. The application backend is based on a primary DB and multiple