We usually look at sizing questions from a timing and load perspective first. 
How many queries per sec on average and peak, and how many inserts per sec on 
average and peak?

With a given sample set, you can often get estimates on read/write IO, which is 
one of the biggest bottle neck in most cases, particularly for inserts. The 
expected IO bandwidth versus available IO bandwidth per host typically gives an 
indication how many hosts you need to reach the ingest speed you are after.

Querying however should be less IO bound, because ideally you try to run from 
indexes as much as possible. More forests helps speed up querying because index 
lookups can be parallelized. The number of forests is linked to the number of 
cores though, like you suggest. It is not a 1 on 1 relation, though. Rough 
thumb rule is 1 or 2 cores per forest. 1 if is it mostly querying or inserting 
only, 2 if both happen at the same time a lot.

That is for bigger forests though. You can probably push it a bit if the 
forests are tiny, and/or used only limited during a day. I think I currently 
have almost 150 forests on my 16 core laptop, 3 to 5 for each demo that i 
happen to have installed. That only works because i rarely use more than one 
demo at the same time.

In the end I think IO bandwidth is more important than the number of forests. 
Also keep in mind that scaling up and down is relatively easy with MarkLogic. 
If you start doing metrics on performance, you should get a good feel of how 
your system would hold up, if you start increasing load.

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Andreas Hubmer 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, November 29, 2017 at 1:19 PM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Multi-Database Architecture

Actually, it is the other way around. MarkLogic prefers multiple forests above 
a single forest...
Don’t put too many forests on a single host though, or they will just compete 
for resources.
Where would you draw the border between preferring many small forests and not 
creating too many forests on a host?
Would you use the expected forest size as indicator? (eg. no forest < 1gb)
Or would you try to create not more forests than cpu-cores /2 per host?

Thanks,
Andreas


2017-11-28 12:38 GMT+01:00 Geert Josten 
<[email protected]<mailto:[email protected]>>:
Actually, it is the other way around. MarkLogic prefers multiple forests above 
a single forest. Each forest has its own in-memory stand, and MarkLogic prefers 
multiple smaller ones above one big one. The idea is that it allows 
parallelizing the workload to resolve from indexes, and also be able to pull 
content from disk in parallel (particularly if multiple hosts, or 
disks/controllers are involved).

Don’t put too many forests on a single host though, or they will just compete 
for resources.

Also note that a forest is not the same as a database. Each database will have 
at least one forest, but could have many more, potentially spread out over 
multiple hosts. So, one big database, or multiple small ones could end up 
resulting in the same in-memory stand sizes. It all depends on how many forests 
each database has, and how much data is inside them.

Whether it makes most sense to use one shared db, or multiple small ones, that 
really is a functional/business question primarily. I’d add though, that I’d 
personally prefer built-in backup over MLCP for backups..

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Andreas Hubmer 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, November 28, 2017 at 10:59 AM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Multi-Database Architecture

Hi,

The clients are different services in a larger micro-service landscape. Some of 
them will store small amounts of data (less than 1GB, maybe even less than 
100MB), others large amounts.
The services with small amounts of data make me worry about efficient usage of 
memory and in-memory-stands. If they share a database, the shared database 
could have larger in-memory stands (in contrast to many small in-memory stands 
of the individual databases). I assume that larger in-memory stands perform 
much better in peak moments?! Additionally, it is easier to tune the 
configuration of one database vs. to tune the configuration of many databases.

On the other hand, we want to have an easy backup & restore process. Do you 
have any suggestions or experience on how this could be done in a shared 
database on a directory level?
The backup could be done with the MLCP (export, point-in-time). The restore 
with MLCP would be a step-process: remove all content from the directory, then 
import the backup. This is not as straight-forward as the builtin backup 
features.

Security, SLAs and data sharing are relevant topics which I feel comfortable 
with.
Maybe we'll go with a mix of shared and individual databases, even though this 
means a more complex architecture.

Thanks,
Andreas


2017-11-23 21:18 GMT+01:00 David Gorbet 
<[email protected]<mailto:[email protected]>>:
If these are completely separate use cases please consider completely separate 
clusters. You can use virtualization to make the hardware work out.

On Nov 23, 2017, at 12:04 PM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:

Hi Andreas,

I think each forest has its own in-memory stand, so if each client has a 
reasonable amount of data, you’ll need several forests per client anyhow. One 
or multiple databases wouldn’t matter much in that case. I wouldn’t worry too 
much about in-memory stands though. Memory is much faster than disk, so worth 
using. And you’ll want spare resources anyhow to handle peak moments, so not 
fully utilizing resources all the time isn’t bad necessarily. An average use of 
30% of cpu and mem is pretty typical i’d say.

I would suggest looking at it more from a business or functional perspective. 
For instance:

  *   Do you need to guarantee clients won’t be able to see each others data? 
That would be a strong argument to want to keep things separate without doubt.
  *   Could different clients have different SLA terms? Another vote for 
keeping things separate.
  *   What if one clients wants to step out, and you need to purge its data? 
Dead simple with separate databases
  *   Is there any change one of the clients would like to run it on-site, 
rather than hosted?
  *   Or for the opposite: would there be any need to mix datasets from 
different clients? Any kind of sharing for instance, even if just of 
statistics, or some anonymous cross-validation?

And you can probably think of many more yourself.

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Andreas Hubmer 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Thursday, November 23, 2017 at 4:53 PM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] Multi-Database Architecture

Hi,

I am planning the architecture of an application with dozens of individual 
clients. I think of using either one database for all data or a separate 
database per client.

The main pros and cons for me are efficient memory usage and the possibility of 
individual backup&restore. I tend to prefer the first and accept more 
complicated restore scenarios.

These are my considerations.

one-db:
* each client would use a different base directory (security: uri-privileges)
* 1 in-memory-stand -> more efficient memory usage. Do you agree that this is 
relevant?
* individual backup & restore of data of one client => complicated (MLCP?)

many-dbs (one db per client):
* many in-memory-stands -> less efficient memory usage / more smaller stands / 
more merging. Do you agree?
* builtin backup & restore of data of one client is possible
* very flexible configuration (individual indexes, ...)
* deployment more complex

For configuration we will use Roxy.

Thanks,
Andreas

--
Andreas Hubmer
Senior IT Consultant

EBCONT enterprise technologies GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna

Mobile: +43 664 60651861<tel:+43%20664%2060651861>
Fax: +43 2772 512 69-9
Email: [email protected]<mailto:[email protected]>
Web: http://www.ebcont.com

OUR TEAM IS YOUR SUCCESS

UID-Nr. ATU68135644
HG St.Pölten - FN 399978 d

VERTRAULICHKEITSHINWEIS/HAFTUNGSAUSSCHLUSS:
Der Inhalt dieser E-Mail und alle beigefügten Anhänge sind vertraulich zu 
behandeln, sind vor Veröffentlichung rechtlich geschützt und sind 
ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der 
vorgesehene Empfänger sind, informieren Sie den Absender bitte umgehend und 
vernichten Sie diese E-Mail samt allen beigefügten Anhängen. Der Inhalt dieser 
Email darf nicht an/oder von dritten weitergeleitet, veröffentlicht, verwendet, 
kopiert oder auf andere Medien gespeichert werden. Wir übernehmen keine Haftung 
für eventuelle Schäden, die durch diese E-Mail oder deren Anhänge entstehen 
könnten.

CONFIDENTIALITY/DISCLAIMER:
This email and any files transmitted with it are confidential, are legally 
protected before publication and are intended solely for the use of the 
individual or entity to whom they are addressed. If you have received this 
email in error, please notify the sender immediately and destroy this e-mail 
together with all attachments. The content of this e-mail may not be be 
disseminated, published, copied or stored on third parties. We assume no 
liability for any damage that may result from this e-mail or its annexes.
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general




--
Andreas Hubmer
Senior IT Consultant

EBCONT enterprise technologies GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna

Mobile: +43 664 60651861<tel:+43%20664%2060651861>
Fax: +43 2772 512 69-9
Email: [email protected]<mailto:[email protected]>
Web: http://www.ebcont.com

OUR TEAM IS YOUR SUCCESS

UID-Nr. ATU68135644
HG St.Pölten - FN 399978 d

VERTRAULICHKEITSHINWEIS/HAFTUNGSAUSSCHLUSS:
Der Inhalt dieser E-Mail und alle beigefügten Anhänge sind vertraulich zu 
behandeln, sind vor Veröffentlichung rechtlich geschützt und sind 
ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der 
vorgesehene Empfänger sind, informieren Sie den Absender bitte umgehend und 
vernichten Sie diese E-Mail samt allen beigefügten Anhängen. Der Inhalt dieser 
Email darf nicht an/oder von dritten weitergeleitet, veröffentlicht, verwendet, 
kopiert oder auf andere Medien gespeichert werden. Wir übernehmen keine Haftung 
für eventuelle Schäden, die durch diese E-Mail oder deren Anhänge entstehen 
könnten.

CONFIDENTIALITY/DISCLAIMER:
This email and any files transmitted with it are confidential, are legally 
protected before publication and are intended solely for the use of the 
individual or entity to whom they are addressed. If you have received this 
email in error, please notify the sender immediately and destroy this e-mail 
together with all attachments. The content of this e-mail may not be be 
disseminated, published, copied or stored on third parties. We assume no 
liability for any damage that may result from this e-mail or its annexes.

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general




--
Andreas Hubmer
Senior IT Consultant

EBCONT enterprise technologies GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna

Mobile: +43 664 60651861
Fax: +43 2772 512 69-9
Email: [email protected]<mailto:[email protected]>
Web: http://www.ebcont.com

OUR TEAM IS YOUR SUCCESS

UID-Nr. ATU68135644
HG St.Pölten - FN 399978 d

VERTRAULICHKEITSHINWEIS/HAFTUNGSAUSSCHLUSS:
Der Inhalt dieser E-Mail und alle beigefügten Anhänge sind vertraulich zu 
behandeln, sind vor Veröffentlichung rechtlich geschützt und sind 
ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der 
vorgesehene Empfänger sind, informieren Sie den Absender bitte umgehend und 
vernichten Sie diese E-Mail samt allen beigefügten Anhängen. Der Inhalt dieser 
Email darf nicht an/oder von dritten weitergeleitet, veröffentlicht, verwendet, 
kopiert oder auf andere Medien gespeichert werden. Wir übernehmen keine Haftung 
für eventuelle Schäden, die durch diese E-Mail oder deren Anhänge entstehen 
könnten.

CONFIDENTIALITY/DISCLAIMER:
This email and any files transmitted with it are confidential, are legally 
protected before publication and are intended solely for the use of the 
individual or entity to whom they are addressed. If you have received this 
email in error, please notify the sender immediately and destroy this e-mail 
together with all attachments. The content of this e-mail may not be be 
disseminated, published, copied or stored on third parties. We assume no 
liability for any damage that may result from this e-mail or its annexes.
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to