Hi Rav, You could compare database settings with those for Modules and Schemas. Those have most options disabled as well.
You could try making the documents that you insert as small as possible. The values you index must be present in docs, and having them in the same fragment is essential for tuples, but getting rid of anything else might save a little. Element name length doesn’t really matter by the way. Storing as JSON likely neither as well. The tree caches are only necessary for retrieving documents, so you could try making them small. Not sure what happens if you set them to zero. Not entirely sure about list cache (keep forgetting the specifics), but if you make it small and you don’t notice much difference in performance, then you are good. Range indexes are always loaded into memory. There is no separate setting for them. The Forest status page gives you the memory consumption (per forest). And last but not least, use multiple forests. Maybe 1 forest per physical core? You don’t want to let the number of docs grow beyond 100 mln docs, so don’t disable merges. And consider using the -fastload option of MLCP.. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Wednesday, September 30, 2015 at 3:45 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 135, Issue 45 Hi all We are setting up websites traffic analytics. And I am planning to use ML. I am just going to store apache logs converted to XML in ML. Roughly we will have around 100 million entries in log file each day so we are going to do 100 million inserts a day. For this much of data we will need huge disk if we want to store data for 4 to 5 years. But to answer all the queries, which we are going to perform all we need is range indexes. So my questions are 1. I know i can turn of all the indexing options(like word searches, stemming etc) from db configuration, but is there any other way to reduce the storage req? Infact all of our queries can be answered from the 4 range indexes using (cts:values, cts:element-attribute-values, cts:element-values and cts:value-tuples). So I do not need actual documents at all. Any suggestion to reduce the size of db? 2. As I am going to use range indexes(coz queries are going to be "less than" and "greater than" type) and these indexes are going to be big, what index settings do you guys suggest to keep the big range indexes in memory? I was thinking of giving more memory to list-caches and less to expanded and compressed tree caches. But In the documentation it says list cache is for term lists. Can someone, please let me know which setting is for range indexes? 3. To make the updates fast I am going to switch of journaling and do batch inserts(coz system shutdown is not a problem we can reload data from log file if we want). Any other suggestion to make inserts fast? Thanks Rav
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
