Hi Naaman, Basically, KIM is strong in analyzing news and small documents. That is because some analyzing resources can't handle big amounts of data. For example the patterns in the Jape rules, may perform greedy matches, which are very heavy over large content.
The maximum size is not an exact measure. The bigger the document the slower the extraction. Generally a document of several pages is a standard. We advise if possible to split the document into several smaller parts and analyze them independently. This will not have a big impact over the quality of the information extraction. And another benefit of this is that you can process the different parts in parallel. If the documents are news, in the general case the most important information is contained in the beginning of the article. So trimming it is also a solution. You can play with this to see how it fits your needs. Hth Philip On 25 Feb 2011, at 7:11 PM, Naaman Musawwir wrote: > Hello, > > We have data in the form of documents. We extract text from these and add > into KIM repository for semantic analysis. Sometime when we try to add a > document it takes forever. Also, if the text size is more than 20 KB it also > kind of hangs the KIM server and it stops responding for any further requests. > > Does the size of document affects and if so, what is the maximum content size > that KIM can process easily? > > Regards, > Naaman Musawwir. > > _______________________________________________ > Kim-discussion mailing list > [email protected] > http://ontotext.com/mailman/listinfo/kim-discussion
_______________________________________________ Kim-discussion mailing list [email protected] http://ontotext.com/mailman/listinfo/kim-discussion
