Hi,
This is essentially a machine learning problem, nothing to do with OpenNLP. If you have such a large corpus, it would take a substantial amount of time to train models. You can possibly have smaller training sets and see if the models deteriorate substantially. Another strategy is to incrementally introduce training sets containing specific class of Token Names - that would provide a quicker turnaround.
Hope this help.
Best,
-Samik


On 18/11/2014 8:46 AM, nikhil jain wrote:
Hi,
I asked below question yesterday, did anyone get a chance to look at this.
I am new in OpenNLP and really need some help. Please provide some clue or link 
or example.
ThanksNIkhil
       From: nikhil jain <[email protected]>
  To: "[email protected]" <[email protected]>; Dev at Opennlp Apache 
<[email protected]>
  Sent: Tuesday, November 18, 2014 12:02 AM
  Subject: Need to speed up the model creation process of OpenNLP
Hi,
I am using OpenNLP Token Name Finder for parsing the unstructured data. I have 
created a corpus of about 4 million records. When I am creating a model out of 
the training set using openNLP API's in Eclipse using default setting (cut-off 
5 and iterations 100), process is taking a good amount of time, around 2-3 
hours.
Can someone suggest me how can I reduce the time as I want to experiment with 
different iterations but as the model creation process is taking so much time, 
I am not able to experiment with it. This is really a time consuming process.
Please provide some feedback.
Thanks in advance.Nikhil Jain


Reply via email to