As Aaron pointed out, MarkLogic forest replication is the right way to do this. 
A filesystem-based replication service won't know how to integrate with the 
MarkLogic cluster failover mechanism, and could leave you with a corrupt 
replica just when you need to fail over to it.

What do you do when your local storage is full? Managing local storage isn't 
fundamentally different than adding NAS storage. Add more storage. When the 
chassis fills up, add another chassis. Be sure to pay attention to the 
controller too. I often see systems where the disks themselves are not very 
busy, but the controller is overloaded.

All this underlines the need to do sizing up-front. You should be able to 
create a reasonable sizing model that balances your CPU, memory, disk, and 
network needs in a way that will last for the useful lifetime of the CPUs. Once 
the CPUs are obsolete, it will probably be time to rebuild the application 
anyway. You can migrate to newer hardware at the same time.

-- Mike

On 12 Feb 2013, at 14:13 , "Khan, Kashif" <[email protected]> wrote:

> Thanks Aaron and Mike for the detailed email. Here is what we are trying to 
> do. 
> 
>       • We are trying to set up a Marklogic Cluster of 3 servers for 
> Faileover.
>       • We do not have GFS/Clustered file system.
>       • We are trying to find out what are our best options.
> From the email chain this is what I understand our options are. 
> 
> Scenario 1:
>       • Use dedicated NAS one for each server. A total of 3 dedicated NAS 
> will be needed.
>       • Configure a file replication service to replicate forests among all 3 
> instances of  NAS.
>     Question: is there any documentation on how to configure the replication 
> service for Marklogic forest replication?
> 
> Scenario 2:
>       • Use local storage on all three servers
>       • Configure a file replication service to replicate forests among all 3 
> instances of  local storage.
>     Question: In this scenario if the local storage reached its capacity we 
> can not increase the local storage. What are the options if  local storage 
> gets maxed out?
> 
> 
> Suggestions are most welcome.
> 
> __________________________
> Kashif Khan
> 
> 
> 
> 
> On 2/9/13 6:41 PM, "Aaron Rosenbaum" <[email protected]> wrote:
> 
>> Yes, you can use NAS. Like SAN, the key is adequate performance. This is the 
>> tricky part because getting that performance is very difficult and very 
>> expensive. When internal policies and infrastructure dictate SAN or NAS, 
>> dedicated high quality NAS can often be preferable to shared, under 
>> provisioned SAN (while being cheaper.)
>> 
>> As Mike pointed out, can you maintain HA with your NAS setup? This is 
>> particular to the unit.
>> 
>> Without a clustered file system, you won't have multiple nodes pointing at 
>> same volume. Each node should receive dedicated pools and bandwidth.
>> 
>> You should not stripe across all volumes then thin provision out of a single 
>> pool.
>> 
>> No CIFS, windows shares, SMB. NFS has performance limitations even with 10g 
>> under Linux. Test, test, test.
>> 
>> It is often overlay services of "fancy" NAS that kill performance - dedup, 
>> compression, site-to-site replication, etc that kill performance.
>> 
>> Is this a shared resource? If so, how do ensure enough bandwidth for the 
>> MarkLogic nodes? How do you ensure you don't destroy the performance of 
>> other nodes?  You should have explicit visibility and control of each 
>> volume.  
>> 
>> An example of successful SLA's can be found in Amazons Provisioned IOPS 
>> storage. While neither SAN nor NAS, it's sets a standard for what you should 
>> expect/demand from shared storage:
>> - explicit bandwidth guarantee to the storage pool (110 mb/sec for most high 
>> end instances - coincidently the practical throughput limit for many NFS 
>> limitations.)
>> - guaranteed IOPS at large block sizes for each volume. You need 20 mb/sec 
>> per forest. 16 forests a node, not unreasonable for a nice system with local 
>> storage, would need 240,000 IOPS at 4k blocks from your NAS. I think you'll 
>> find local storage much more cost effective.
>> - sustained SLA compliance even if maxing out all guarantees. A typical 
>> pattern sometimes is that a MarkLogic user will ask for that much bandwidth 
>> (80K 4k IOPS per node) then get laughed at by the storage admins. It's out 
>> of band with everything they have experience with. MarkLogic can end up 
>> looking more like a video streaming load than like Oracle. It really uses 
>> that much bandwidth and if the total provided is less, performance can drop 
>> off a cliff.
>> 
>> We are developing guidelines now for AWS storage but one rule of thumb is 
>> probably useful for NAS also. If you can, provision one volume per forest so 
>> you can track an allocate performance by volume/forest with less effort.  It 
>> also will make reallocation of load easier.
>> 
>> Local disk replication will move the copies of forests around for HA. Don't 
>> try to do that with the disk subsystem.
>> 
>> If you pass along more details as to planned configurations, I may be of 
>> more help.
>> 
>> Aaron Rosenbaum
>> Director, Product Management
>> [email protected]
>> 
>> 
>> Sent from my iPhone
> 
> 
>> 
>>> An HTML attachment was scrubbed...
>>> URL: 
>>> http://developer.marklogic.com/pipermail/general/attachments/20130208/53611d26/attachment-0001.html
>>> ------------------------------
>>> Message: 2
>>> Date: Fri, 8 Feb 2013 14:51:30 -0800
>>> From: Michael Blakeley <[email protected]>
>>> Subject: Re: [MarkLogic Dev General] Marklogic Cluster Setup
>>> To: MarkLogic Developer Discussion <[email protected]>
>>> Message-ID: <[email protected]>
>>> Content-Type: text/plain; charset=windows-1252
>>> The question "which is faster?" is impossible to answer generically. It's 
>>> possible to design local storage so that it is slower or faster than a 
>>> given NAS. It's possible to design NAS so that it is slower or faster than 
>>> given local storage. But in most cases it is cheaper to build out similar 
>>> levels of performance from local disk than from NAS (or SAN).
>>> Performance aside, I would not use a NAS as part of a failover solution. 
>>> The whole point of failover is high availability, and relying on a NAS 
>>> simply introduces another system that can fail. Using a NAS also implies 
>>> shared filesystems, which are cantankerous and require their own fencing 
>>> mechanisms. This pulls in yet more systems that can fail, and probably will.
>>> I prefer to use local storage, with local replication of forests. This also 
>>> avoids the strong probability that the I/O demands of the cluster will 
>>> swamp the network link to the NAS, or the NAS controller.
>>> So I would size the number of forests needed, then the storage capacity and 
>>> I/O performance needed, and finally specify local disk and network to meet 
>>> those needs.
>>> -- Mike
> 
> 
>>> On 8 Feb 2013, at 14:26 , "Khan, Kashif" <[email protected]> wrote:
>>>> Hello Everyone, We are creating a Marklogic Cluster for failover. I have a 
>>>> couple of questions.
>>>>     ? We are planning to use NAS for data storage. Is there any 
>>>> performance hit if we use NAS over SAN?
>>>>     ? We do not have GFS setup.
>>>>         ? It is possible to attach One NAS file store to all 3 MarkLogic 
>>>> Servers in the cluster?
>>>>         ? OR do we have to attach an Independent NAS with each Marklogic 
>>>> Instance and set up a cloning job to transfer data to each of the other 2 
>>>> NAS instances.
>>>> From the documentation it seems like we can not attach one NAS file store 
>>>> to all three MArkLogic servers unless we have GFS. Any info will be 
>>>> greatly appreciated.
>>>> 
>>>> Kashif Khan
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to