Hi all,

To implement $subject in ML we need all feature values of the dataset
against predicted and actual values for test data. But Spark only returns
predicted and actual values as test results. Right now we use random 10,000
data rows for other visualizations and we cannot use same data for this
visualization since that random 10,000 data does not correspond to test
data (test data is a subtracted from dataset according to the train data
fraction at model building stage).

One option is to persist test data at testing stage, but it can be too
large for some datasets according to train data fraction. Appreciate if you
can give your comments on this.

Thanks,
CD

-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to