Hi,
This is essentially a machine learning problem, nothing to do with
OpenNLP. If you have such a large corpus, it would take a substantial
amount of time to train models. You can possibly have smaller training
sets and see if the models deteriorate substantially. Another strategy
is to incrementally introduce training sets containing specific class of
Token Names - that would provide a quicker turnaround.
Hope this help.
Best,
-Samik
On 18/11/2014 8:46 AM, nikhil jain wrote:
Hi,
I asked below question yesterday, did anyone get a chance to look at this.
I am new in OpenNLP and really need some help. Please provide some clue or link
or example.
ThanksNIkhil
From: nikhil jain <[email protected]>
To: "[email protected]" <[email protected]>; Dev at Opennlp Apache
<[email protected]>
Sent: Tuesday, November 18, 2014 12:02 AM
Subject: Need to speed up the model creation process of OpenNLP
Hi,
I am using OpenNLP Token Name Finder for parsing the unstructured data. I have
created a corpus of about 4 million records. When I am creating a model out of
the training set using openNLP API's in Eclipse using default setting (cut-off
5 and iterations 100), process is taking a good amount of time, around 2-3
hours.
Can someone suggest me how can I reduce the time as I want to experiment with
different iterations but as the model creation process is taking so much time,
I am not able to experiment with it. This is really a time consuming process.
Please provide some feedback.
Thanks in advance.Nikhil Jain