Saurabh, Are there document boundaries (new lines) in your training data?
Jeff On Tue, Apr 11, 2017 at 6:07 AM, Saurabh Jain <saurabh4768j...@gmail.com> wrote: > Hi All > > I am cross validating NameFinder training data using > TokenNameFinderCrossValidator. Training parameters are as follows: > > Train algorithm name: MAXENT > Trainer Type name: EventModel > Iteration value: 100 > Cut off value: 5 > Beam size: 5 > No of folds: 3 > Total training instances: 22351 > > Code snippet: > > try { > > evaluate = new TokenNameFinderCrossValidator("en", entity, > trainingParameters, TokenNameFinderFactory.create(null, > > entityExtractionProcessor.getFeatureGenMap().get(entity), > Collections.emptyMap(), new BioCodec())); > > } catch (InvalidFormatException e) { > > e.printStackTrace(); > > } > > evaluate.evaluate(sampleStream, 3); > > > evaluate method is giving InsufficientTrainingDataException. Can anyone > suggest me why it is happening as I have passed 22351 training instances > and if it is 3 folds, then each fold will get around 7000 instances. > > > -- > *Thanks & Regards* > > > *Saurabh Jain * > *AI Developer* > > *Active Intelligence * > > *"* > *To do a thing yesterday was the best time . Second best time is today .” * >