Re: [MarkLogic Dev General] Optimiziing for several writes

seme...@hotmail.com Mon, 06 Feb 2012 17:37:03 -0800

Golden. Thanks mike

What about thousands of writes per second. Any differences?


Sent from my iPhone

On Feb 6, 2012, at 4:56 PM, "Michael Blakeley" <m...@blakeley.com> wrote:

> That doesn't sound too challenging. The points you've already raised are 
> good, but you will need whatever indexing you need. You might try to avoid 
> using property fragments, if possible (disable maintain-last-modified, for 
> example). Depending on your queries, you may be able to disable some or all 
> of the default full-text indexing, and rely on a combination of the built-in 
> XPath indexes and application-specific range indexes.
> 
> Think hard about your document URIs. You will want the URIs to be such that 
> lock contention simply won't happen. For example you could use xdmp:random to 
> generate URIs, or some combination of ids and timestamps that will guarantee 
> uniqueness. Let's say you receive an update for each ticker symbol once per 
> second, for example. You might structure your URIs as SYMBOL/TIMESTAMP, or as 
> TIMESTAMP/SYMBOL. Put some thought into which of those might be more useful 
> at query time.
> 
> You may want to reduce the size of your in-memory stands. This may sound 
> backward. Folks often try to optimize ingestion by using really large 
> in-memory stands, but with small documents this can be counter-productive. 
> With high-frequency updates and small documents, you may be better off 
> limiting each in-memory stand to less than 32k fragments, and reducing the 
> in-memory limits accordingly so that you can use that memory elsewhere.
> 
> After that it will mostly be a question of keeping up with the demands on 
> CPU, memory, and disk. Given modern Xeon CPUs and memory sizes, the disk is 
> probably the hardest part. You want fast sequential writes for journaling and 
> for saving in-memory stands as they fill up. You'll also need fairly good 
> read performance for merges. As a rule of thumb, try to have 10-MB/sec of 
> read-write capacity per 1-MB/sec of incoming XML.
> 
> You might also benefit from a little SSD storage configured as a fast data 
> directory for your forests (requires MarkLogic 5). But I think you can hit 
> your targets with spinning disks, as long as you configure them properly.
> 
> You'll probably want to have 1-2 forests per filesystem, spread out across 
> multiple block devices, rather than putting everything on one giant 
> filesystem. Consider avoiding RAID entirely, and using forest replication 
> instead. If you do use RAID, use RAID-1 and RAID-10. Avoid RAID-5 and RAID-6, 
> because their write performance is likely to be a problem.
> 
> -- Mike
> 
> On 6 Feb 2012, at 14:11 , seme...@hotmail.com wrote:
> 
>> Not sure, but let's say hundreds a second.
>> 
>>> From: m...@blakeley.com
>>> Date: Mon, 6 Feb 2012 14:10:42 -0800
>>> To: general@developer.marklogic.com
>>> Subject: Re: [MarkLogic Dev General] Optimiziing for several writes
>>> 
>>> How many inserts/sec do you think the database will need to sustain?
>>> 
>>> -- Mike
>>> 
>>> On 6 Feb 2012, at 13:57 , seme...@hotmail.com wrote:
>>> 
>>>> So I've normally dealt with optimizing MarkLogic for few writes but many 
>>>> reads. In a situation where there are several writes and fewer reads (as 
>>>> with reports on stock ticks for example), are there any pointers or tips 
>>>> for speeding up writes? I can imagine that reducing the number of indexes 
>>>> helps, as does always writing new files rather than updating existing 
>>>> ones, and keep the files small. Anything else? I may need some indexes for 
>>>> reporting purposes. And I realize that it may be better to let another 
>>>> system write the data while MarkLogic ingests soon thereafter, but I am 
>>>> interested in truly realtime data views, not next-day, or next-hour views 
>>>> into the data.
>>>> 
>>>> thanks,
>>>> Ryan
>>>> _______________________________________________
>>>> General mailing list
>>>> General@developer.marklogic.com
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimiziing for several writes

Reply via email to