Ok so I tried setting the regParam and tried lowering it. how do I evaluate
which regParam is best. Do I have to to do it by trial and error. I am
currently calculating the log_loss for the model. Is it good to find the
best regparam value. here is my code:

from math import exp,log
#from pyspark.sql.functions import log
epsilon = 1e-16
def sigmoid_log_loss(w,x):
  ans=float(1/(1+exp(-(w.dot(x.features)))))
  if ans==0:
    ans=ans+epsilon
  if ans==1:
    ans=ans-epsilon
  log_loss=-((x.label)*log(ans)+(1-x.label)*log(1-ans))
  return ((ans,x.label),log_loss)

-------------------------------------------------------
reg=0.02
from pyspark.ml.classification import LogisticRegression
lr=LogisticRegression(regParam=reg,maxIter=500,standardization=True,elasticNetParam=0.5)
model=lr.fit(data_train_df)

w=model.coefficients
intercept=model.intercept
data_predicted_df=data_val_df.map(lambda x:(sigmoid_log_loss(w,x)))
log_loss=data_predicted_df.map(lambda x:x[1]).mean()
print log_loss



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19444.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to