Hi all We are setting up websites traffic analytics. And I am planning to use ML. I am just going to store apache logs converted to XML in ML. Roughly we will have around 100 million entries in log file each day so we are going to do 100 million inserts a day. For this much of data we will need huge disk if we want to store data for 4 to 5 years. But to answer all the queries, which we are going to perform all we need is range indexes. So my questions are
1. I know i can turn of all the indexing options(like word searches, stemming etc) from db configuration, but is there any other way to reduce the storage req? Infact all of our queries can be answered from the 4 range indexes using (cts:values, cts:element-attribute-values, cts:element-values and cts:value-tuples). So I do not need actual documents at all. Any suggestion to reduce the size of db? 2. As I am going to use range indexes(coz queries are going to be "less than" and "greater than" type) and these indexes are going to be big, what index settings do you guys suggest to keep the big range indexes in memory? I was thinking of giving more memory to list-caches and less to expanded and compressed tree caches. But In the documentation it says list cache is for term lists. Can someone, please let me know which setting is for range indexes? 3. To make the updates fast I am going to switch of journaling and do batch inserts(coz system shutdown is not a problem we can reload data from log file if we want). Any other suggestion to make inserts fast? Thanks Rav
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
