Re: Problem in running MLlib SVM

Jeff Zhang Sat, 28 Nov 2015 19:07:02 -0800

I think this should represent the label of LabledPoint (0 means negative 1
means positive)
http://spark.apache.org/docs/latest/mllib-data-types.html#labeled-point


The document you mention is for the mathematical formula, not the
implementation.

On Sun, Nov 29, 2015 at 9:13 AM, Tarek Elgamal <tarek.elga...@gmail.com>
wrote:

> According to the documentation
> <http://spark.apache.org/docs/latest/mllib-linear-methods.html>, by
> default, if wTx≥0 then the outcome is positive, and negative otherwise. I
> suppose that wTx is the "score" in my case. If score is more than 0 and the
> label is positive, then I return 1 which is correct classification and I
> return zero otherwise. Do you have any idea how to classify a point as
> positive or negative using this score or another function ?
>
> On Sat, Nov 28, 2015 at 5:14 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>>         if((score >=0 && label == 1) || (score <0 && label == 0))
>>              {
>>               return 1; //correct classiciation
>>              }
>>              else
>>               return 0;
>>
>>
>>
>> I suspect score is always between 0 and 1
>>
>>
>>
>> On Sat, Nov 28, 2015 at 10:39 AM, Tarek Elgamal <tarek.elga...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to run the straightforward example of SVm but I am getting
>>> low accuracy (around 50%) when I predict using the same data I used for
>>> training. I am probably doing the prediction in a wrong way. My code is
>>> below. I would appreciate any help.
>>>
>>>
>>> import java.util.List;
>>>
>>> import org.apache.spark.SparkConf;
>>> import org.apache.spark.SparkContext;
>>> import org.apache.spark.api.java.JavaRDD;
>>> import org.apache.spark.api.java.function.Function;
>>> import org.apache.spark.api.java.function.Function2;
>>> import org.apache.spark.mllib.classification.SVMModel;
>>> import org.apache.spark.mllib.classification.SVMWithSGD;
>>> import org.apache.spark.mllib.regression.LabeledPoint;
>>> import org.apache.spark.mllib.util.MLUtils;
>>>
>>> import scala.Tuple2;
>>> import edu.illinois.biglbjava.readers.LabeledPointReader;
>>>
>>> public class SimpleDistSVM {
>>>   public static void main(String[] args) {
>>>     SparkConf conf = new SparkConf().setAppName("SVM Classifier
>>> Example");
>>>     SparkContext sc = new SparkContext(conf);
>>>     String inputPath=args[0];
>>>
>>>     // Read training data
>>>     JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc,
>>> inputPath).toJavaRDD();
>>>
>>>     // Run training algorithm to build the model.
>>>     int numIterations = 3;
>>>     final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations);
>>>
>>>     // Clear the default threshold.
>>>     model.clearThreshold();
>>>
>>>
>>>     // Predict points in test set and map to an RDD of 0/1 values where
>>> 0 is misclassication and 1 is correct classification
>>>     JavaRDD<Integer> classification = data.map(new
>>> Function<LabeledPoint, Integer>() {
>>>          public Integer call(LabeledPoint p) {
>>>            int label = (int) p.label();
>>>            Double score = model.predict(p.features());
>>>            if((score >=0 && label == 1) || (score <0 && label == 0))
>>>            {
>>>            return 1; //correct classiciation
>>>            }
>>>            else
>>>             return 0;
>>>
>>>          }
>>>        }
>>>      );
>>>     // sum up all values in the rdd to get the number of correctly
>>> classified examples
>>>      int sum=classification.reduce(new Function2<Integer, Integer,
>>> Integer>()
>>>     {
>>>     public Integer call(Integer arg0, Integer arg1)
>>>     throws Exception {
>>>     return arg0+arg1;
>>>     }});
>>>
>>>      //compute accuracy as the percentage of the correctly classified
>>> examples
>>>      double accuracy=((double)sum)/((double)classification.count());
>>>      System.out.println("Accuracy = " + accuracy);
>>>
>>>         }
>>>       }
>>>     );
>>>   }
>>> }
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


-- 
Best Regards

Jeff Zhang

Re: Problem in running MLlib SVM

Reply via email to