spark lda runs out of disk space

2016-02-29 Thread TheGeorge1918 .
Hi guys I was running lda with 2000 topics on 6G compressed data, roughly 1.2 million docs. I used aws 3 r3.8xlarge machines as core nodes. It turned out spark applications crushed after 3 or 4 iterations. From ganglia, it indicated the disk space was all consumed. I believe it’s the shuffle

Re: Spark LDA model reuse with new set of data

2016-01-26 Thread Joseph Bradley
> first contact with ML). > > Ok, I am trying to write a DSL where you can run some commands. > > I did a command that trains the Spark LDA and it produces the topics I want > and I saved it using the save method provided by the LDAModel. > > Now I want to load this LDAModel and u

Spark LDA

2016-01-22 Thread Ilya Ganelin
Hi all - I'm running the Spark LDA algorithm on a dataset of roughly 3 million terms with a resulting RDD of approximately 20 GB on a 5 node cluster with 10 executors (3 cores each) and 14gb of memory per executor. As the application runs, I'm seeing progressively longer execution times