Re: [MarkLogic Dev General] Optimiziing for several writes

Geert Josten Mon, 06 Feb 2012 22:44:44 -0800

Hi Ryan,

It might be worth to turn off automatic directory creation as well. The
more slashes in your URIs, the bigger the impact. Need to analyze the
URIs? Use the URI lexicon instead..


Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: general-boun...@developer.marklogic.com [mailto:general-
> boun...@developer.marklogic.com] Namens Michael Blakeley
> Verzonden: dinsdag 7 februari 2012 3:11
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Optimiziing for several writes
>
> More I/O, of course. At some point it will become difficult to get the
forests to
> coordinate transactions quickly enough. The limit will depend on the
network,
> CPU, and disk speeds. Using more than one ingestion host (ie, HTTP
client or XCC
> client) can help to push that limit out, too.
>
> -- Mike
>
> On 6 Feb 2012, at 17:36 , seme...@hotmail.com wrote:
>
> > Golden. Thanks mike
> >
> > What about thousands of writes per second. Any differences?
> >
> > Sent from my iPhone
> >
> > On Feb 6, 2012, at 4:56 PM, "Michael Blakeley" <m...@blakeley.com>
wrote:
> >
> >> That doesn't sound too challenging. The points you've already raised
are
> good, but you will need whatever indexing you need. You might try to
avoid
> using property fragments, if possible (disable maintain-last-modified,
for
> example). Depending on your queries, you may be able to disable some or
all of
> the default full-text indexing, and rely on a combination of the
built-in XPath
> indexes and application-specific range indexes.
> >>
> >> Think hard about your document URIs. You will want the URIs to be
such that
> lock contention simply won't happen. For example you could use
xdmp:random
> to generate URIs, or some combination of ids and timestamps that will
> guarantee uniqueness. Let's say you receive an update for each ticker
symbol
> once per second, for example. You might structure your URIs as
> SYMBOL/TIMESTAMP, or as TIMESTAMP/SYMBOL. Put some thought into which
> of those might be more useful at query time.
> >>
> >> You may want to reduce the size of your in-memory stands. This may
sound
> backward. Folks often try to optimize ingestion by using really large
in-memory
> stands, but with small documents this can be counter-productive. With
high-
> frequency updates and small documents, you may be better off limiting
each in-
> memory stand to less than 32k fragments, and reducing the in-memory
limits
> accordingly so that you can use that memory elsewhere.
> >>
> >> After that it will mostly be a question of keeping up with the
demands on
> CPU, memory, and disk. Given modern Xeon CPUs and memory sizes, the disk
is
> probably the hardest part. You want fast sequential writes for
journaling and for
> saving in-memory stands as they fill up. You'll also need fairly good
read
> performance for merges. As a rule of thumb, try to have 10-MB/sec of
read-
> write capacity per 1-MB/sec of incoming XML.
> >>
> >> You might also benefit from a little SSD storage configured as a fast
data
> directory for your forests (requires MarkLogic 5). But I think you can
hit your
> targets with spinning disks, as long as you configure them properly.
> >>
> >> You'll probably want to have 1-2 forests per filesystem, spread out
across
> multiple block devices, rather than putting everything on one giant
filesystem.
> Consider avoiding RAID entirely, and using forest replication instead.
If you do
> use RAID, use RAID-1 and RAID-10. Avoid RAID-5 and RAID-6, because their
> write performance is likely to be a problem.
> >>
> >> -- Mike
> >>
> >> On 6 Feb 2012, at 14:11 , seme...@hotmail.com wrote:
> >>
> >>> Not sure, but let's say hundreds a second.
> >>>
> >>>> From: m...@blakeley.com
> >>>> Date: Mon, 6 Feb 2012 14:10:42 -0800
> >>>> To: general@developer.marklogic.com
> >>>> Subject: Re: [MarkLogic Dev General] Optimiziing for several writes
> >>>>
> >>>> How many inserts/sec do you think the database will need to
sustain?
> >>>>
> >>>> -- Mike
> >>>>
> >>>> On 6 Feb 2012, at 13:57 , seme...@hotmail.com wrote:
> >>>>
> >>>>> So I've normally dealt with optimizing MarkLogic for few writes
but many
> reads. In a situation where there are several writes and fewer reads (as
with
> reports on stock ticks for example), are there any pointers or tips for
speeding
> up writes? I can imagine that reducing the number of indexes helps, as
does
> always writing new files rather than updating existing ones, and keep
the files
> small. Anything else? I may need some indexes for reporting purposes.
And I
> realize that it may be better to let another system write the data while
> MarkLogic ingests soon thereafter, but I am interested in truly realtime
data
> views, not next-day, or next-hour views into the data.
> >>>>>
> >>>>> thanks,
> >>>>> Ryan
> >>>>> _______________________________________________
> >>>>> General mailing list
> >>>>> General@developer.marklogic.com
> >>>>> http://developer.marklogic.com/mailman/listinfo/general
> >>>>
> >>>> _______________________________________________
> >>>> General mailing list
> >>>> General@developer.marklogic.com
> >>>> http://developer.marklogic.com/mailman/listinfo/general
> >>> _______________________________________________
> >>> General mailing list
> >>> General@developer.marklogic.com
> >>> http://developer.marklogic.com/mailman/listinfo/general
> >>
> >> _______________________________________________
> >> General mailing list
> >> General@developer.marklogic.com
> >> http://developer.marklogic.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > General@developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
> >
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimiziing for several writes

Reply via email to