test.evaluate(samples, 1), here the second parameter is the number of folds, usually you use 10 or a number larger than 1.
The amount of times you need for training with perceptron is linear to the iterations, if you use 300 instead of 100 it should take three times as long. Jörn On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <damianopo...@gmail.com> wrote: > Jorn, > I am training and testing the model via api. If it is not a training > problem. How is that possible that the evaluation is taking 2 days (and > still running) to evaluate the model? As i told you with 100 iterations i > can get the model and the test in ~30 minutes. > > I only have a doubt about evaluation, this is the code: > > try (ObjectStream<NameSample> samples = > ObjectStreamUtils.createObjectStream(evaluation)) { > > TrainingParameters mlParams = new TrainingParameters(); > mlParams.put(TrainingParameters.ALGORITHM_PARAM, > PerceptronTrainer.PERCEPTRON_VALUE); > mlParams.put(TrainingParameters.ITERATIONS_PARAM, > Integer.toString(100)); > mlParams.put(TrainingParameters.CUTOFF_PARAM, > Integer.toString(0)); > > TokenNameFinderCrossValidator test = new > TokenNameFinderCrossValidator("it", > null, mlParams, null, > (TokenNameFinderEvaluationMonitor)null); > > test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE* > > FMeasure result = test.getFMeasure(); > > System.out.println(result.toString()); > } > > What should i put on the second parameter of test.evaluate() ? Each sample > (in samples variable) represents a document. There are no relations with > other samples. > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > Hello, > > > > the model is only available after the training finished, hard to guess > what > > you are doing. > > > > Do you use the command line? Which command? > > > > Jörn > > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <damianopo...@gmail.com> > > wrote: > > > > > Hello Jorn, > > > I tried with 300 iterations and it takes forever, reducing that number > to > > > 100 i can finally get the model in half an hour. > > > > > > The problem with 300 iterations is that i can see the model (.bin) in > > half > > > an hour too but the computations are still running. So i do not really > > > understand what it is doing. > > > > > > Damiano > > > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > > > > > Hello, > > > > > > > > this looks like output from the cross validator. > > > > > > > > Jörn > > > > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta < > damianopo...@gmail.com > > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I am training a NER model with perceptron classifier (using OpenNLP > > > > 1.7.0) > > > > > > > > > > the output of the training is: > > > > > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > Computing event counts... done. 11861603 events > > > > > Indexing... done. > > > > > Collecting events... Done indexing. > > > > > Incorporating indexed data for training... > > > > > done. > > > > > Number of Event Tokens: 11861603 > > > > > Number of Outcomes: 23 > > > > > Number of Predicates: 6623489 > > > > > Computing model parameters... > > > > > Performing 300 iterations. > > > > > 1: . (11795234/11861603) 0.9944047191597966 > > > > > 2: . (11820243/11861603) 0.9965131188423689 > > > > > 3: . (11829329/11861603) 0.9972791198626357 > > > > > 4: . (11834935/11861603) 0.9977517372651908 > > > > > 5: . (11838996/11861603) 0.9980941024581584 > > > > > 6: . (11841501/11861603) 0.9983052880795286 > > > > > 7: . (11843704/11861603) 0.998491013398442 > > > > > 8: . (11845304/11861603) 0.9986259024180796 > > > > > 9: . (11846421/11861603) 0.9987200718149141 > > > > > 10: . (11847181/11861603) 0.9987841440992419 > > > > > 20: . (11852226/11861603) 0.9992094660392866 > > > > > 30: . (11853947/11861603) 0.9993545560410343 > > > > > 40: . (11854831/11861603) 0.999429082224384 > > > > > 50: . (11855471/11861603) 0.999483037832239 > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > Stats: (11846242/11861603) 0.998704981105842 > > > > > ...done. > > > > > Compressed 6623489 parameters to 554312 > > > > > 6892 outcome patterns > > > > > Indexing events using cutoff of 0 > > > > > > > > > > Computing event counts... done. 6370206 events > > > > > Indexing... done. > > > > > Collecting events... Done indexing. > > > > > Incorporating indexed data for training... > > > > > done. > > > > > Number of Event Tokens: 6370206 > > > > > Number of Outcomes: 23 > > > > > Number of Predicates: 3737425 > > > > > Computing model parameters... > > > > > Performing 300 iterations. > > > > > 1: . (6330365/6370206) 0.9937457281601254 > > > > > 2: . (6345859/6370206) 0.9961779885925196 > > > > > 3: . (6351552/6370206) 0.9970716802564941 > > > > > 4: . (6354847/6370206) 0.9975889319748843 > > > > > 5: . (6356872/6370206) 0.997906818084062 > > > > > 6: . (6358350/6370206) 0.998138835698563 > > > > > 7: . (6359611/6370206) 0.9983367884806237 > > > > > 8: . (6360473/6370206) 0.9984721059256169 > > > > > 9: . (6361138/6370206) 0.9985764981540628 > > > > > 10: . (6361532/6370206) 0.9986383485871572 > > > > > 20: . (6364161/6370206) 0.9990510510963068 > > > > > 30: . (6365106/6370206) 0.9991993979472563 > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > Stats: (6360617/6370206) 0.9984947111600473 > > > > > ...done. > > > > > Indexing events using cutoff of 0 > > > > > > > > > > Computing event counts... done. 6370114 events > > > > > Indexing... done. > > > > > Collecting events... Done indexing. > > > > > Incorporating indexed data for training... > > > > > done. > > > > > Number of Event Tokens: 6370114 > > > > > Number of Outcomes: 23 > > > > > Number of Predicates: 3737390 > > > > > Computing model parameters... > > > > > Performing 300 iterations. > > > > > 1: . (6330266/6370114) 0.9937445389517362 > > > > > 2: . (6345810/6370114) 0.9961846836650019 > > > > > 3: . (6351374/6370114) 0.9970581374210885 > > > > > 4: . (6354747/6370114) 0.9975876412886803 > > > > > 5: . (6356872/6370114) 0.9979212302950936 > > > > > 6: . (6358429/6370114) 0.998165652922381 > > > > > 7: . (6359417/6370114) 0.9983207521874805 > > > > > 8: . (6360292/6370114) 0.9984581123665919 > > > > > 9: . (6361076/6370114) 0.9985811870870757 > > > > > 10: . (6361693/6370114) 0.998678045636232 > > > > > 20: . (6364109/6370114) 0.9990573167136413 > > > > > 30: . (6365008/6370114) 0.9991984444862368 > > > > > 40: . (6365478/6370114) 0.9992722265253023 > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > Stats: (6359985/6370114) 0.9984099185666065 > > > > > ...done. > > > > > Indexing events using cutoff of 0 > > > > > > > > > > Computing event counts... done. 6370480 events > > > > > Indexing... done. > > > > > Collecting events... Done indexing. > > > > > Incorporating indexed data for training... > > > > > done. > > > > > Number of Event Tokens: 6370480 > > > > > Number of Outcomes: 23 > > > > > Number of Predicates: 3737798 > > > > > Computing model parameters... > > > > > Performing 300 iterations. > > > > > 1: . (6330685/6370480) 0.9937532179678769 > > > > > 2: . (6346153/6370480) 0.9961812924614786 > > > > > 3: . (6351726/6370480) 0.9970561088018485 > > > > > 4: . (6355089/6370480) 0.9975840125076917 > > > > > 5: . (6357173/6370480) 0.9979111464128292 > > > > > 6: . (6358780/6370480) 0.9981634036995642 > > > > > 7: . (6359845/6370480) 0.9983305810551167 > > > > > 8: . (6360827/6370480) 0.9984847295651191 > > > > > 9: . (6361316/6370480) 0.9985614898720347 > > > > > 10: . (6362076/6370480) 0.9986807901445417 > > > > > 20: . (6364506/6370480) 0.9990622370684784 > > > > > 30: . (6365415/6370480) 0.9992049264733583 > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > Stats: (6362594/6370480) 0.9987621026986977 > > > > > ...done. > > > > > Indexing events using cutoff of 0 > > > > > > > > > > Computing event counts... done. 6370008 events > > > > > Indexing... done. > > > > > Collecting events... Done indexing. > > > > > Incorporating indexed data for training... > > > > > done. > > > > > Number of Event Tokens: 6370008 > > > > > Number of Outcomes: 23 > > > > > Number of Predicates: 3737824 > > > > > Computing model parameters... > > > > > Performing 300 iterations. > > > > > 1: . (6330200/6370008) 0.9937507142848172 > > > > > 2: . (6345643/6370008) 0.9961750440501802 > > > > > 3: . (6351415/6370008) 0.9970811653611737 > > > > > 4: . (6354522/6370008) 0.9975689198506501 > > > > > 5: . (6356723/6370008) 0.9979144453193779 > > > > > 6: . (6358164/6370008) 0.9981406616757781 > > > > > 7: . (6359399/6370008) 0.9983345389833106 > > > > > 8: . (6360274/6370008) 0.9984719014481614 > > > > > 9: . (6360694/6370008) 0.9985378354312899 > > > > > 10: . (6361531/6370008) 0.9986692324405244 > > > > > .... > > > > > .... > > > > > .... > > > > > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and *300 > > > > > iterators*. > > > > > > > > > > The corpus is relative small, it has 20k sentences. > > > > > > > > > > I do not remember an output like that using MAXENT classifier. > > > > > > > > > > Damiano > > > > > > > > > > > > > > >