Re: OutOfMemoryError - When saving Word2Vec

2016-06-13 Thread sharad82
Is this the right forum to post Spark related issues ? I have tried this
forum along with StackOverflow but not seeing any response.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142p27151.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



OutOfMemoryError - When saving Word2Vec

2016-06-12 Thread sharad82
When trying to save the word2vec model trained over 10G of data leads to
below OOM error.

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Spark Version: 1.6
spark.dynamicAllocation.enable  false
spark.executor.memory   75g
spark.driver.memory 150g
spark.driver.cores  10

Full Stack Trace:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at scala.StringContext.standardInterpolator(StringContext.scala:122)
at scala.StringContext.s(StringContext.scala:90)
at
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334)
at
org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:271)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:91)
at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131)
at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:172)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: word2vec: how to save an mllib model and reload it?

2016-06-10 Thread sharad82
I am having problem in serializing a ML word2vec model. 

Am I doing something wrong ?


http://stackoverflow.com/questions/37723308/spark-ml-word2vec-serialization-issues

  




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/word2vec-how-to-save-an-mllib-model-and-reload-it-tp18329p27137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark ML Word2Vec Serialization Issues

2016-06-09 Thread sharad82
http://stackoverflow.com/questions/37723308/spark-ml-word2vec-serialization-issues

  

I recently refactored our Word2Vec code to move to DataFrame based ml
models, but I am having problem in serializing and loading the model
locally.

I am able to successfully:

1. Fit the dataframe and create the model.
2. Retrieve synonyms.

When I try to serialize the model locally, vectors are not serialized and
hence the size of the file is too small approx 2K for 10GB of data.

FileOutputStream fo = new FileOutputStream("/tmp/word2vec");
ObjectOutputStream so = new ObjectOutputStream(fo);
so.writeObject(word2VecModel);
so.flush();
so.close();
logger.info("Word2Vec model saved");

On loading the model and calling the findSynonyms() function results in
below exception:

java.lang.NullPointerException at
org.apache.spark.ml.feature.Word2VecModel.transform(Word2Vec.scala:224)

Is there a way to save the model locally ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-ML-Word2Vec-Serialization-Issues-tp27125.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org