Ok so I tried setting the regParam and tried lowering it. how do I evaluate
which regParam is best. Do I have to to do it by trial and error. I am
currently calculating the log_loss for the model. Is it good to find the
best regparam value. here is my code:
from math import exp,log
#from pyspark.sql.functions import log
epsilon = 1e-16
def sigmoid_log_loss(w,x):
ans=float(1/(1+exp(-(w.dot(x.features)))))
if ans==0:
ans=ans+epsilon
if ans==1:
ans=ans-epsilon
log_loss=-((x.label)*log(ans)+(1-x.label)*log(1-ans))
return ((ans,x.label),log_loss)
-------------------------------------------------------
reg=0.02
from pyspark.ml.classification import LogisticRegression
lr=LogisticRegression(regParam=reg,maxIter=500,standardization=True,elasticNetParam=0.5)
model=lr.fit(data_train_df)
w=model.coefficients
intercept=model.intercept
data_predicted_df=data_val_df.map(lambda x:(sigmoid_log_loss(w,x)))
log_loss=data_predicted_df.map(lambda x:x[1]).mean()
print log_loss
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19444.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]