Thanks again Mike for the detailed thoughts. So are you suggesting that big clients get their own database and smaller clients are shared? I think one big concern for me is the development overhead around a shared database, there is just a lot more things to consider. For instance things like dictionaries, language settings and other configuration. The separate database model keeps this very clean. On the other hand it brings up issues like the need to create App Servers for each database, which means probably 2-3 App servers for each client. Could this be a major issue?
Also, regarding your RAID analogy, I guess I don't understand why we create multiple forests at all if the db is going to distribute data across them automatically. Given how I understand them, why wouldn't the database manage all of this internally? On Mon, Jul 7, 2014 at 3:27 PM, Michael Blakeley <[email protected]> wrote: > That suggests a raw tree size of about 5-GB for a large customer. With a > high level of text indexing it might approach 20-GB or even 40-GB. That's a > medium size for a forest. It's best to limit them to about 200-GB, but > short of that a larger forest is more efficient than a smaller one. Since > those are your larger customers, that suggests you could combine quite a > few smaller customers. To me this points to a shared database. > > Forest storage is basically schemaless. Simply ingesting XML doesn't > validate it against a schema. You do that explicitly using a validate { ... > } expression. It's possible to make that happen using a trigger, if you > want automatic validation. But usually it's better to accept documents even > when they don't validate, so that fixing them is a database operation. > > Your next question may be: should I map specific customers to specific > forests? Usually no. Usually it's better to let the database spread > documents around. Think of the forests as disks in a RAID volume, rather > than sub-databases. > > -- Mike > > On 7 Jul 2014, at 10:58 , Casey Jordan <[email protected]> wrote: > > > Thanks, I figured that there would be more resources that were not > shared when having multiple dbs. That being said, I am not sure it would be > a big impact in my case. I would say that a big client might have 500k > documents that are around 10kb each. > > > > Also, another consideration is that each client needs to have separate > schemas for their content. So this might force me into the multi db design. > Unless I made the default content store forest schemaless > > > > Is it even possible to have a schemaless forest? > > > > > > On Mon, Jul 7, 2014 at 1:37 PM, Gene Thomas <[email protected]> wrote: > > I think the overall performance would be best with your content in > separate databases. > > > > Gene > > > > > > On Monday, July 7, 2014 10:33 AM, Casey Jordan <[email protected]> > wrote: > > > > > > Thanks guys that is really helpful information. > > > > Is there any significant performance or resource tradeoffs when > choosing between putting everything in one big database vs splitting it > into one for each "client"? Personally I like the idea of keeping > everything as separate as possible, but if this mean that it had some major > tradeoff that would be good to know. > > > > > > On Mon, Jul 7, 2014 at 1:28 PM, Justin Makeig < > [email protected]> wrote: > > Casey, > > There are two ways in MarkLogic 7 to query a specific database: Create a > separate app server (HTTP or XDBC) for each database. An app server has a > default database that you can set in configuration. Each query/update > evaluated for that app server runs against that database. Many app servers > can point to one database, but an app server can only be associated with > one database. Another, lower-level means is to use xdmp:eval < > http://docs.marklogic.com/xdmp:eval?q=xdmp:eval> or xdmp:invoke. These > allow you to specify a database at runtime and evaluate specific code > against it. I wouldn't recommend this as a general approach, though. It > will make your code less readable and, in certain scenarios, will prevent > MarkLogic from maximizing some performance optimizations it does under the > covers. > > > > Another approach might be to create protected collections for each > "tenant" within the same database. With MarkLogic's role-based security, > you can be assured that you can completely restrict viewing and editing to > very specific roles. You can take a similar approach to running privileged > code with amps. Take a look at the Security Guide for more details < > http://docs.marklogic.com/guide/admin/security#chapter>. > > > > Justin > > > > > > > > Justin Makeig > > Director, Product Management > > MarkLogic Corporation > > [email protected] > > www.marklogic.com > > > > > > > > On Jul 7, 2014, at 10:14 AM, Casey Jordan <[email protected]> > wrote: > > > >> Hi all, > >> > >> I am checking out Mark Logic for the first time and I was interested if > there is any information around designing a cluster for multi-tenancy? > >> > >> I assumed that I could create a separate database for each "client" > that would be using the application, and then segment data that way. > However right away it became a little unclear to me as to how I query a > specific database (couldn't find an example of this in the docs), or manage > users, triggers, schemas etc for a specific database. > >> > >> I know this is a fairly general question, but any advice would be > helpful. > >> > >> Thanks > >> > >> -- > >> -- > >> Casey Jordan > >> easyDITA a product of Jorsek LLC > >> "CaseyDJordan" on LinkedIn, Twitter & Facebook > >> (585) 348 7399 > >> easydita.com > >> > >> > >> This message is intended only for the use of the Addressee(s) and may > >> contain information that is privileged, confidential, and/or exempt from > >> disclosure under applicable law. If you are not the intended recipient, > >> please be advised that any disclosure copying, distribution, or use of > >> the information contained herein is prohibited. If you have received > >> this communication in error, please destroy all copies of the message, > >> whether in electronic or hard copy format, as well as attachments, and > >> immediately contact the sender by replying to this e-mail or by phone. > >> Thank you. > >> _______________________________________________ > >> General mailing list > >> [email protected] > >> http://developer.marklogic.com/mailman/listinfo/general > > > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > > > > > > > > > -- > > -- > > Casey Jordan > > easyDITA a product of Jorsek LLC > > "CaseyDJordan" on LinkedIn, Twitter & Facebook > > (585) 348 7399 > > easydita.com > > > > > > This message is intended only for the use of the Addressee(s) and may > > contain information that is privileged, confidential, and/or exempt from > > disclosure under applicable law. If you are not the intended recipient, > > please be advised that any disclosure copying, distribution, or use of > > the information contained herein is prohibited. If you have received > > this communication in error, please destroy all copies of the message, > > whether in electronic or hard copy format, as well as attachments, and > > immediately contact the sender by replying to this e-mail or by phone. > > Thank you. > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > > > > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > > > > > > > > > -- > > -- > > Casey Jordan > > easyDITA a product of Jorsek LLC > > "CaseyDJordan" on LinkedIn, Twitter & Facebook > > (585) 348 7399 > > easydita.com > > > > > > This message is intended only for the use of the Addressee(s) and may > > contain information that is privileged, confidential, and/or exempt from > > disclosure under applicable law. If you are not the intended recipient, > > please be advised that any disclosure copying, distribution, or use of > > the information contained herein is prohibited. If you have received > > this communication in error, please destroy all copies of the message, > > whether in electronic or hard copy format, as well as attachments, and > > immediately contact the sender by replying to this e-mail or by phone. > > Thank you. > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > -- -- Casey Jordan easyDITA a product of Jorsek LLC "CaseyDJordan" on LinkedIn, Twitter & Facebook (585) 348 7399 easydita.com This message is intended only for the use of the Addressee(s) and may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, please be advised that any disclosure copying, distribution, or use of the information contained herein is prohibited. If you have received this communication in error, please destroy all copies of the message, whether in electronic or hard copy format, as well as attachments, and immediately contact the sender by replying to this e-mail or by phone. Thank you.
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
