Hi all

We are setting up websites traffic analytics. And I am planning to use ML.
I am just going to store apache logs converted to XML in ML. Roughly we
will have around 100 million entries in log file each day so we are going
to do 100 million inserts a day. For this much of data we will need huge
disk if we want to store data for 4 to 5 years. But to answer all the
queries, which we are going to perform all we need is range indexes. So my
questions are

1. I know i can turn of all the indexing options(like word searches,
stemming etc) from db configuration, but is there any other way to reduce
the storage req? Infact all of our queries can be answered from the 4 range
indexes using (cts:values, cts:element-attribute-values, cts:element-values
and cts:value-tuples). So I do not need actual documents at all. Any
suggestion to reduce the size of db?

2. As I am going to use range indexes(coz queries are going to be "less
than" and "greater than" type) and these indexes are going to be big, what
index settings do you guys suggest to keep the big range indexes in memory?
I was thinking of giving more memory to list-caches and less to expanded
and compressed tree caches. But In the documentation it says list cache is
for term lists. Can someone, please let me know which setting is for range
indexes?

3. To make the updates fast I am going to switch of journaling and do batch
inserts(coz system shutdown is not a problem we can reload data from log
file if we want). Any other suggestion to make inserts fast?

Thanks
Rav
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to