Hi Ryan, It might be worth to turn off automatic directory creation as well. The more slashes in your URIs, the bigger the impact. Need to analyze the URIs? Use the URI lexicon instead..
Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: general-boun...@developer.marklogic.com [mailto:general- > boun...@developer.marklogic.com] Namens Michael Blakeley > Verzonden: dinsdag 7 februari 2012 3:11 > Aan: General MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] Optimiziing for several writes > > More I/O, of course. At some point it will become difficult to get the forests to > coordinate transactions quickly enough. The limit will depend on the network, > CPU, and disk speeds. Using more than one ingestion host (ie, HTTP client or XCC > client) can help to push that limit out, too. > > -- Mike > > On 6 Feb 2012, at 17:36 , seme...@hotmail.com wrote: > > > Golden. Thanks mike > > > > What about thousands of writes per second. Any differences? > > > > Sent from my iPhone > > > > On Feb 6, 2012, at 4:56 PM, "Michael Blakeley" <m...@blakeley.com> wrote: > > > >> That doesn't sound too challenging. The points you've already raised are > good, but you will need whatever indexing you need. You might try to avoid > using property fragments, if possible (disable maintain-last-modified, for > example). Depending on your queries, you may be able to disable some or all of > the default full-text indexing, and rely on a combination of the built-in XPath > indexes and application-specific range indexes. > >> > >> Think hard about your document URIs. You will want the URIs to be such that > lock contention simply won't happen. For example you could use xdmp:random > to generate URIs, or some combination of ids and timestamps that will > guarantee uniqueness. Let's say you receive an update for each ticker symbol > once per second, for example. You might structure your URIs as > SYMBOL/TIMESTAMP, or as TIMESTAMP/SYMBOL. Put some thought into which > of those might be more useful at query time. > >> > >> You may want to reduce the size of your in-memory stands. This may sound > backward. Folks often try to optimize ingestion by using really large in-memory > stands, but with small documents this can be counter-productive. With high- > frequency updates and small documents, you may be better off limiting each in- > memory stand to less than 32k fragments, and reducing the in-memory limits > accordingly so that you can use that memory elsewhere. > >> > >> After that it will mostly be a question of keeping up with the demands on > CPU, memory, and disk. Given modern Xeon CPUs and memory sizes, the disk is > probably the hardest part. You want fast sequential writes for journaling and for > saving in-memory stands as they fill up. You'll also need fairly good read > performance for merges. As a rule of thumb, try to have 10-MB/sec of read- > write capacity per 1-MB/sec of incoming XML. > >> > >> You might also benefit from a little SSD storage configured as a fast data > directory for your forests (requires MarkLogic 5). But I think you can hit your > targets with spinning disks, as long as you configure them properly. > >> > >> You'll probably want to have 1-2 forests per filesystem, spread out across > multiple block devices, rather than putting everything on one giant filesystem. > Consider avoiding RAID entirely, and using forest replication instead. If you do > use RAID, use RAID-1 and RAID-10. Avoid RAID-5 and RAID-6, because their > write performance is likely to be a problem. > >> > >> -- Mike > >> > >> On 6 Feb 2012, at 14:11 , seme...@hotmail.com wrote: > >> > >>> Not sure, but let's say hundreds a second. > >>> > >>>> From: m...@blakeley.com > >>>> Date: Mon, 6 Feb 2012 14:10:42 -0800 > >>>> To: general@developer.marklogic.com > >>>> Subject: Re: [MarkLogic Dev General] Optimiziing for several writes > >>>> > >>>> How many inserts/sec do you think the database will need to sustain? > >>>> > >>>> -- Mike > >>>> > >>>> On 6 Feb 2012, at 13:57 , seme...@hotmail.com wrote: > >>>> > >>>>> So I've normally dealt with optimizing MarkLogic for few writes but many > reads. In a situation where there are several writes and fewer reads (as with > reports on stock ticks for example), are there any pointers or tips for speeding > up writes? I can imagine that reducing the number of indexes helps, as does > always writing new files rather than updating existing ones, and keep the files > small. Anything else? I may need some indexes for reporting purposes. And I > realize that it may be better to let another system write the data while > MarkLogic ingests soon thereafter, but I am interested in truly realtime data > views, not next-day, or next-hour views into the data. > >>>>> > >>>>> thanks, > >>>>> Ryan > >>>>> _______________________________________________ > >>>>> General mailing list > >>>>> General@developer.marklogic.com > >>>>> http://developer.marklogic.com/mailman/listinfo/general > >>>> > >>>> _______________________________________________ > >>>> General mailing list > >>>> General@developer.marklogic.com > >>>> http://developer.marklogic.com/mailman/listinfo/general > >>> _______________________________________________ > >>> General mailing list > >>> General@developer.marklogic.com > >>> http://developer.marklogic.com/mailman/listinfo/general > >> > >> _______________________________________________ > >> General mailing list > >> General@developer.marklogic.com > >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > > General mailing list > > General@developer.marklogic.com > > http://developer.marklogic.com/mailman/listinfo/general > > > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general