Pls assist: Spark DecisionTree question

Marco Mistroni Fri, 10 Jun 2016 11:39:45 -0700

HI all
 i am trying to run a ML program against some data, using DecisionTrees.
To fine tune the parameters, i am running this loop to find the optimal
values for
impurity, depth and bins


for (impurity <- Array("gini", "entropy");
           depth    <- Array(1,2,3, 4, 5);
           bins     <- Array(10,20,25,28)) yield {
           val model = DecisionTree.trainClassifier(
               trainingData, numClasses, categoricalFeaturesInfo,
               impurity, depth, bins)

           val accuracy = getMetrics(model, testData).precision
           ((impurity, depth, bins), accuracy)

Could anyone explain me
why, if i run my program multiple times against the SAME
data, i get different optimal results for the parameters above?
i assume if i run the loop above agains the same data i will always get the
same results?
to  give you an example run1 returned following top results

((gini,4,28),0.8)
((gini,4,25),0.8)
((gini,3,28),0.8)
((gini,3,25),0.8)
((entropy,3,28),0.7333333333333333)

while run2 gives me this top results


((entropy,2,28),0.6842105263157895)
((entropy,2,25),0.6842105263157895)
((entropy,2,20),0.6842105263157895)
((entropy,2,10),0.6842105263157895)
((entropy,1,28),0.684210526315789

could anyone explain why?

kind regards
 marco

Pls assist: Spark DecisionTree question

Reply via email to