Re: [Kim-discussion] Maximum content size for a document

Philip Alexiev @ Ontotext Sat, 26 Feb 2011 01:11:52 -0800

Hi Naaman,

Basically, KIM is strong in analyzing news and small documents. That is because 
some analyzing resources can't handle big amounts of data. For example the 
patterns in the Jape rules, may perform greedy matches, which are very heavy 
over large content.

The maximum size is not an exact measure. The bigger the document the slower 
the extraction. Generally a document of several pages is a standard. 

We advise if possible to split the document into several smaller parts and 
analyze them independently. This will not have a big impact over the quality of 
the information extraction.  And another benefit of this is that you can 
process the different parts in parallel.

If the documents are news, in the general case the most important information 
is contained in the beginning of the article. So trimming it is also a 
solution.  You can play with this to see how it fits your needs.

Hth
Philip

On 25 Feb 2011, at 7:11 PM, Naaman Musawwir wrote:

> Hello,
>  
> We have data in the form of documents. We extract text from these and add 
> into KIM repository for semantic analysis. Sometime when we try to add a 
> document it takes forever. Also, if the text size is more than 20 KB it also 
> kind of hangs the KIM server and it stops responding for any further requests.
>  
> Does the size of document affects and if so, what is the maximum content size 
> that KIM can process easily?
>  
> Regards,
> Naaman Musawwir.
>  
> _______________________________________________
> Kim-discussion mailing list
> [email protected]
> http://ontotext.com/mailman/listinfo/kim-discussion

_______________________________________________
Kim-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion

Re: [Kim-discussion] Maximum content size for a document

Reply via email to