[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036832#comment-13036832 ]
XiaoboGu commented on MAHOUT-696: --------------------------------- I add a scores option in 696-r2.patch, but I don't know whether this make sense, I just copy the code from TrainLogistic. There is another question, because there are concurrent threads training the examples, will the scores option cause concurrent performance problems, because the main thread will read and convert csv records into Vectors, will it become a bottleneck ? I copy the unfinished main function of my TrainAdaptiveLogistic class here for your reference: public static void main(String[] args) throws IOException { if (parseArgs(args)) { double logPEstimate = 0; int k = 0; CsvRecordFactory csv = lmp.getCsvRecordFactory(); AdaptiveLogisticRegression lr = lmp .createAdaptiveLogisticRegression(); for (int pass = 0; pass < passes; pass++) { BufferedReader in = open(inputFile); // read variable names csv.firstLine(in.readLine()); String line = in.readLine(); while (line != null) { // for each new line, get target and predictors Vector input = new RandomAccessSparseVector(lmp.getNumFeatures()); int targetValue = csv.processLine(line, input); // update model lr.train(targetValue, input); k ++; if (scores && (k % (skipscorenum + 1) == 0) ) { State<Wrapper, CrossFoldLearner> best = lr.getBest(); CrossFoldLearner learner = null; if (null != best) { learner = best.getPayload().getLearner(); } if (learner != null) { // check performance while this is still news double logP = learner.logLikelihood(targetValue, input); if (!Double.isInfinite(logP)) { if (k < 20) { logPEstimate = (k * logPEstimate + logP) / (k + 1); } else { logPEstimate = 0.95 * logPEstimate + 0.05 * logP; } } double p = learner.classifyScalar(input); output.printf(Locale.ENGLISH, "%10d %2d %10.2f %2.4f %10.4f %10.4f\n", k, targetValue, learner.percentCorrect(), p, logP, logPEstimate); }else{ output.printf(Locale.ENGLISH, "%10d %2d %s\n", k, targetValue, "AdaptiveLogisticRegression is not ready for scoring ... "); } } line = in.readLine(); } in.close(); } OutputStream modelOutput = new FileOutputStream(outputFile); try { lmp.saveTo(modelOutput); } finally { modelOutput.close(); } output.printf(Locale.ENGLISH, "%d\n", lmp.getNumFeatures()); output.printf(Locale.ENGLISH, "%s ~ ", lmp.getTargetVariable()); String sep = ""; } } > Command line program for AdaptiveLogiscticRegression > ---------------------------------------------------- > > Key: MAHOUT-696 > URL: https://issues.apache.org/jira/browse/MAHOUT-696 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.5 > Reporter: XiaoboGu > Fix For: 0.6 > > Attachments: mahout-696-r1.patch, mahout-696-r2.patch > > > Suggested by Ted, I'll try to write a command line program for > AdaptiveLogicticRegression, but as I am not familir with the algorithm, I'll > try to write a prototype for the program from a Java developer's perspactive, > hope anyone else will help with the details of the algorithm. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira