That suggests a raw tree size of about 5-GB for a large customer. With a high
level of text indexing it might approach 20-GB or even 40-GB. That's a medium
size for a forest. It's best to limit them to about 200-GB, but short of that a
larger forest is more efficient than a smaller one. Since those are your larger
customers, that suggests you could combine quite a few smaller customers. To me
this points to a shared database.
Forest storage is basically schemaless. Simply ingesting XML doesn't validate
it against a schema. You do that explicitly using a validate { ... }
expression. It's possible to make that happen using a trigger, if you want
automatic validation. But usually it's better to accept documents even when
they don't validate, so that fixing them is a database operation.
Your next question may be: should I map specific customers to specific forests?
Usually no. Usually it's better to let the database spread documents around.
Think of the forests as disks in a RAID volume, rather than sub-databases.
-- Mike
On 7 Jul 2014, at 10:58 , Casey Jordan <[email protected]> wrote:
> Thanks, I figured that there would be more resources that were not shared
> when having multiple dbs. That being said, I am not sure it would be a big
> impact in my case. I would say that a big client might have 500k documents
> that are around 10kb each.
>
> Also, another consideration is that each client needs to have separate
> schemas for their content. So this might force me into the multi db design.
> Unless I made the default content store forest schemaless
>
> Is it even possible to have a schemaless forest?
>
>
> On Mon, Jul 7, 2014 at 1:37 PM, Gene Thomas <[email protected]> wrote:
> I think the overall performance would be best with your content in separate
> databases.
>
> Gene
>
>
> On Monday, July 7, 2014 10:33 AM, Casey Jordan <[email protected]>
> wrote:
>
>
> Thanks guys that is really helpful information.
>
> Is there any significant performance or resource tradeoffs when choosing
> between putting everything in one big database vs splitting it into one for
> each "client"? Personally I like the idea of keeping everything as separate
> as possible, but if this mean that it had some major tradeoff that would be
> good to know.
>
>
> On Mon, Jul 7, 2014 at 1:28 PM, Justin Makeig <[email protected]>
> wrote:
> Casey,
> There are two ways in MarkLogic 7 to query a specific database: Create a
> separate app server (HTTP or XDBC) for each database. An app server has a
> default database that you can set in configuration. Each query/update
> evaluated for that app server runs against that database. Many app servers
> can point to one database, but an app server can only be associated with one
> database. Another, lower-level means is to use xdmp:eval
> <http://docs.marklogic.com/xdmp:eval?q=xdmp:eval> or xdmp:invoke. These allow
> you to specify a database at runtime and evaluate specific code against it. I
> wouldn't recommend this as a general approach, though. It will make your code
> less readable and, in certain scenarios, will prevent MarkLogic from
> maximizing some performance optimizations it does under the covers.
>
> Another approach might be to create protected collections for each "tenant"
> within the same database. With MarkLogic's role-based security, you can be
> assured that you can completely restrict viewing and editing to very specific
> roles. You can take a similar approach to running privileged code with amps.
> Take a look at the Security Guide for more details
> <http://docs.marklogic.com/guide/admin/security#chapter>.
>
> Justin
>
>
>
> Justin Makeig
> Director, Product Management
> MarkLogic Corporation
> [email protected]
> www.marklogic.com
>
>
>
> On Jul 7, 2014, at 10:14 AM, Casey Jordan <[email protected]> wrote:
>
>> Hi all,
>>
>> I am checking out Mark Logic for the first time and I was interested if
>> there is any information around designing a cluster for multi-tenancy?
>>
>> I assumed that I could create a separate database for each "client" that
>> would be using the application, and then segment data that way. However
>> right away it became a little unclear to me as to how I query a specific
>> database (couldn't find an example of this in the docs), or manage users,
>> triggers, schemas etc for a specific database.
>>
>> I know this is a fairly general question, but any advice would be helpful.
>>
>> Thanks
>>
>> --
>> --
>> Casey Jordan
>> easyDITA a product of Jorsek LLC
>> "CaseyDJordan" on LinkedIn, Twitter & Facebook
>> (585) 348 7399
>> easydita.com
>>
>>
>> This message is intended only for the use of the Addressee(s) and may
>> contain information that is privileged, confidential, and/or exempt from
>> disclosure under applicable law. If you are not the intended recipient,
>> please be advised that any disclosure copying, distribution, or use of
>> the information contained herein is prohibited. If you have received
>> this communication in error, please destroy all copies of the message,
>> whether in electronic or hard copy format, as well as attachments, and
>> immediately contact the sender by replying to this e-mail or by phone.
>> Thank you.
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> --
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law. If you are not the intended recipient,
> please be advised that any disclosure copying, distribution, or use of
> the information contained herein is prohibited. If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> --
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law. If you are not the intended recipient,
> please be advised that any disclosure copying, distribution, or use of
> the information contained herein is prohibited. If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general