Re: [MarkLogic Dev General] Optimize for index lookups (General Digest, Vol 135, Issue 45)

Geert Josten Thu, 01 Oct 2015 00:25:12 -0700

Hi Rav,

You could compare database settings with those for Modules and Schemas. Those 
have most options disabled as well.


You could try making the documents that you insert as small as possible. The 
values you index must be present in docs, and having them in the same fragment 
is essential for tuples, but getting rid of anything else might save a little. 
Element name length doesn’t really matter by the way. Storing as JSON likely 
neither as well.

The tree caches are only necessary for retrieving documents, so you could try 
making them small. Not sure what happens if you set them to zero.

Not entirely sure about list cache (keep forgetting the specifics), but if you 
make it small and you don’t notice much difference in performance, then you are 
good.

Range indexes are always loaded into memory. There is no separate setting for 
them. The Forest status page gives you the memory consumption (per forest).

And last but not least, use multiple forests. Maybe 1 forest per physical core? 
You don’t want to let the number of docs grow beyond 100 mln docs, so don’t 
disable merges. And consider using the -fastload option of MLCP..

Kind regards,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, September 30, 2015 at 3:45 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] General Digest, Vol 135, Issue 45

Hi all

We are setting up websites traffic analytics. And I am planning to use ML. I am 
just going to store apache logs converted to XML in ML. Roughly we will have 
around 100 million entries in log file each day so we are going to do 100 
million inserts a day. For this much of data we will need huge disk if we want 
to store data for 4 to 5 years. But to answer all the queries, which we are 
going to perform all we need is range indexes. So my questions are

1. I know i can turn of all the indexing options(like word searches, stemming 
etc) from db configuration, but is there any other way to reduce the storage 
req? Infact all of our queries can be answered from the 4 range indexes using 
(cts:values, cts:element-attribute-values, cts:element-values and 
cts:value-tuples). So I do not need actual documents at all. Any suggestion to 
reduce the size of db?

2. As I am going to use range indexes(coz queries are going to be "less than" 
and "greater than" type) and these indexes are going to be big, what index 
settings do you guys suggest to keep the big range indexes in memory? I was 
thinking of giving more memory to list-caches and less to expanded and 
compressed tree caches. But In the documentation it says list cache is for term 
lists. Can someone, please let me know which setting is for range indexes?

3. To make the updates fast I am going to switch of journaling and do batch 
inserts(coz system shutdown is not a problem we can reload data from log file 
if we want). Any other suggestion to make inserts fast?

Thanks
Rav

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimize for index lookups (General Digest, Vol 135, Issue 45)

Reply via email to