Re: Prediction using Classification with text attributes in Apache Spark MLLib

2017-10-20 Thread lmk
Trying to improve the old solution. Do we have a better text classifier now in Spark Mllib? Regards, lmk -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread lmk
Hi Dhimant, As I had indicated in my next mail, my problem was due to disk getting full with log messages (these were dumped into the slaves) and did not have anything to do with the content pushed into s3. So, looks like this error message is very generic and is thrown for various reasons. You

Re: SchemaRDD saveToCassandra

2014-09-16 Thread lmk
Hi Michael, Please correct me if I am wrong. The error seems to originate from spark only. Please have a look at the stack trace of the error which is as follows: [error] (run-main-0) java.lang.NoSuchMethodException: Cannot resolve any suitable constructor for class

SchemaRDD saveToCassandra

2014-09-11 Thread lmk
to cassandra just like the regular rdd? If that is not possible, is there any way to convert the schema RDD to a regular RDD ? Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-saveToCassandra-tp13951.html Sent from

NotSerializableException while doing rdd.saveToCassandra

2014-08-27 Thread lmk
field is org.apache.spark.rdd.RDD[LogLine] = MappedRDD[7] at map at console:45 How can I convert this to Serializable, or is this a different problem? Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NotSerializableException-while

Re: Apache Spark- Cassandra - NotSerializable Exception while saving to cassandra

2014-08-27 Thread lmk
Hi Yana I have done take and confirmed existence of data..Also checked that it is getting connected to Cassandra.. That is why I suspect that this particular rdd is not serializable.. Thanks, Lmk On Aug 28, 2014 5:13 AM, Yana [via Apache Spark User List] ml-node+s1001560n12960...@n3.nabble.com

Re: Bad Digest error while doing aws s3 put

2014-08-07 Thread lmk
This was a completely misleading error message.. The problem was due to a log message getting dumped to the stdout. This was getting accumulated in the workers and hence there was no space left on device after some time. When I re-tested with spark-0.9.1, the saveAsTextFile api threw no space

Re: Bad Digest error while doing aws s3 put

2014-08-05 Thread lmk
files when the cluster is of 4 m3.2xlarge slaves it throws Bad Digest error after writing 86/100 files when the cluster is of 5 m3.2xlarge slaves it succeeds writing all the 100 files when the cluster is of 6 m3.2xlarge slaves.. Please clarify. Regards, lmk -- View this message in context

Re: Bad Digest error while doing aws s3 put

2014-08-04 Thread lmk
with this problem for the past couple of weeks. Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11345.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Bad Digest error while doing aws s3 put

2014-07-28 Thread lmk
to some partitions only, say while writing to 240 partitions, it might succeed for 156 files and then it will start throwing the Bad Digest Error and then it hangs. Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error

save to HDFS

2014-07-24 Thread lmk
and log says that save is complete also. But I am not able to find the file I have saved anywhere. Is there a way I can access this file? Pls advice. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578.html Sent from the Apache

Re: save to HDFS

2014-07-24 Thread lmk
://masteripaddress:9000/root/test-app/test1/ after I login to the cluster? Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578p10581.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: save to HDFS

2014-07-24 Thread lmk
Thanks Akhil. I was able to view the files. Actually I was trying to list the same using regular ls and since it did not show anything I was concerned. Thanks for showing me the right direction. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Bad Digest error while doing aws s3 put

2014-07-17 Thread lmk
to md5 checksum mismatch. But will this happen due to load? Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
Hi, I am trying to predict an attribute with binary value (Yes/No) using SVM. All my attributes which belong to the training set are text attributes. I understand that I have to convert my outcome as double (0.0/1.0). But I donot understand how to deal with my explanatory variables which are also

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
? Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8168.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Sorry Again. In this method, i see that the values for a - positions.iterator b - positions.iterator always remain the same. I tried to do a b - positions.iterator.next, it throws an error: value filter is not a member of (Double, Double) Is there something I

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thanks a lot. That solved my problem. Thanks again for the quick response and solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html Sent from the Apache Spark User List mailing

Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi, I am a new spark user. Pls let me know how to handle the following scenario: I have a data set with the following fields: 1. DeviceId 2. latitude 3. longitude 4. ip address 5. Datetime 6. Mobile application name With the above data, I would like to perform the following steps: 1. Collect all

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi Oleg/Andrew, Thanks much for the prompt response. We expect thousands of lat/lon pairs for each IP address. And that is my concern with the Cartesian product approach. Currently for a small sample of this data (5000 rows) I am grouping by IP address and then computing the distance between