Re: [MarkLogic Dev General] Guides on database design for multi-tenancy?

Michael Blakeley Mon, 07 Jul 2014 12:27:46 -0700

That suggests a raw tree size of about 5-GB for a large customer. With a high 
level of text indexing it might approach 20-GB or even 40-GB. That's a medium 
size for a forest. It's best to limit them to about 200-GB, but short of that a 
larger forest is more efficient than a smaller one. Since those are your larger 
customers, that suggests you could combine quite a few smaller customers. To me 
this points to a shared database.


Forest storage is basically schemaless. Simply ingesting XML doesn't validate 
it against a schema. You do that explicitly using a validate { ... } 
expression. It's possible to make that happen using a trigger, if you want 
automatic validation. But usually it's better to accept documents even when 
they don't validate, so that fixing them is a database operation.

Your next question may be: should I map specific customers to specific forests? 
Usually no. Usually it's better to let the database spread documents around. 
Think of the forests as disks in a RAID volume, rather than sub-databases.

-- Mike

On 7 Jul 2014, at 10:58 , Casey Jordan <[email protected]> wrote:

> Thanks, I figured that there would be more resources that were not shared 
> when having multiple dbs. That being said, I am not sure it would be a big 
> impact in my case. I would say that a big client might have 500k documents 
> that are around 10kb each. 
> 
> Also, another consideration is that each client needs to have separate 
> schemas for their content. So this might force me into the multi db design. 
> Unless I made the default content store forest schemaless 
> 
> Is it even possible to have a schemaless forest?
> 
> 
> On Mon, Jul 7, 2014 at 1:37 PM, Gene Thomas <[email protected]> wrote:
> I think the overall performance would be best with your content in separate 
> databases.
>  
> Gene
> 
> 
> On Monday, July 7, 2014 10:33 AM, Casey Jordan <[email protected]> 
> wrote:
> 
> 
> Thanks guys that is really helpful information.
> 
> Is there any significant  performance or resource tradeoffs when choosing 
> between putting everything in one big database vs splitting it into one for 
> each "client"? Personally I like the idea of keeping everything as separate 
> as possible, but if this mean that it had some major tradeoff that would be 
> good to know.
> 
> 
> On Mon, Jul 7, 2014 at 1:28 PM, Justin Makeig <[email protected]> 
> wrote:
> Casey,
> There are two ways in MarkLogic 7 to query a specific database: Create a 
> separate app server (HTTP or XDBC) for each database. An app server has a 
> default database that you can set in configuration. Each query/update 
> evaluated for that app server runs against that database. Many app servers 
> can point to one database, but an app server can only be associated with one 
> database. Another, lower-level means is to use xdmp:eval 
> <http://docs.marklogic.com/xdmp:eval?q=xdmp:eval> or xdmp:invoke. These allow 
> you to specify a database at runtime and evaluate specific code against it. I 
> wouldn't recommend this as a general approach, though. It will make your code 
> less readable and, in certain scenarios, will prevent MarkLogic from 
> maximizing some performance optimizations it does under the covers.
> 
> Another approach might be to create protected collections for each "tenant" 
> within the same database. With MarkLogic's role-based security, you can be 
> assured that you can completely restrict viewing and editing to very specific 
> roles. You can take a similar approach to running privileged code with amps. 
> Take a look at the Security Guide for more details 
> <http://docs.marklogic.com/guide/admin/security#chapter>.
> 
> Justin
> 
> 
> 
> Justin Makeig
> Director, Product Management
> MarkLogic Corporation
> [email protected]
> www.marklogic.com
> 
> 
> 
> On Jul 7, 2014, at 10:14 AM, Casey Jordan <[email protected]> wrote:
> 
>> Hi all,
>> 
>> I am checking out Mark Logic for the first time and I was interested if 
>> there is any information around designing a cluster for multi-tenancy?
>> 
>> I assumed that I could create a separate database for each "client" that 
>> would be using the application, and then segment data that way. However 
>> right away it became a little unclear to me as to how I query a specific 
>> database (couldn't find an example of this in the docs), or manage users, 
>> triggers, schemas etc for a specific database. 
>> 
>> I know this is a fairly general question, but any advice would be helpful.
>> 
>> Thanks
>> 
>> -- 
>> --
>> Casey Jordan
>> easyDITA a product of Jorsek LLC
>> "CaseyDJordan" on LinkedIn, Twitter & Facebook
>> (585) 348 7399
>> easydita.com
>> 
>> 
>> This message is intended only for the use of the Addressee(s) and may
>> contain information that is privileged, confidential, and/or exempt from
>> disclosure under applicable law.  If you are not the intended recipient,
>> please be advised that any disclosure  copying, distribution, or use of
>> the information contained herein is prohibited.  If you have received
>> this communication in error, please destroy all copies of the message,
>> whether in electronic or hard copy format, as well as attachments, and
>> immediately contact the sender by replying to this e-mail or by phone.
>> Thank you.
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> 
> 
> -- 
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
> 
> 
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law.  If you are not the intended recipient,
> please be advised that any disclosure  copying, distribution, or use of
> the information contained herein is prohibited.  If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> 
> 
> -- 
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
> 
> 
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law.  If you are not the intended recipient,
> please be advised that any disclosure  copying, distribution, or use of
> the information contained herein is prohibited.  If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Guides on database design for multi-tenancy?

Reply via email to