Ok so I tried setting the regParam and tried lowering it. how do I evaluate which regParam is best. Do I have to to do it by trial and error. I am currently calculating the log_loss for the model. Is it good to find the best regparam value. here is my code:
from math import exp,log #from pyspark.sql.functions import log epsilon = 1e-16 def sigmoid_log_loss(w,x): ans=float(1/(1+exp(-(w.dot(x.features))))) if ans==0: ans=ans+epsilon if ans==1: ans=ans-epsilon log_loss=-((x.label)*log(ans)+(1-x.label)*log(1-ans)) return ((ans,x.label),log_loss) ------------------------------------------------------- reg=0.02 from pyspark.ml.classification import LogisticRegression lr=LogisticRegression(regParam=reg,maxIter=500,standardization=True,elasticNetParam=0.5) model=lr.fit(data_train_df) w=model.coefficients intercept=model.intercept data_predicted_df=data_val_df.map(lambda x:(sigmoid_log_loss(w,x))) log_loss=data_predicted_df.map(lambda x:x[1]).mean() print log_loss -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19444.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org