On Thu, 2007-03-08 at 11:54 -0800, Ryan Ordway wrote:
> On 3/8/07 4:54 AM, "Cory Snavely" <[EMAIL PROTECTED]> spake:
> 
> > At any rate, re: the assetstore, if you want a load-balanced
> > environment, I am quite sure that real-time synchronization is
> > necessary. Even with an hourly rsync--problematic at best with a large
> > repository, BTW--a deposit on one instance and a subsequent attempted
> > retrieval of it on the other would cause issues. There are a number of
> > ways to share a file system among several servers but I would think that
> > the most accessible would be any reasonable NAS storage backend
> > depending on your existing storage infrastructure.
> 
> I am also trying to avoid single points of failure. These hosts are both
> connected to a SAN, but want both hosts to have a copy of the data.
> 
> I'm considering some form of on-demand synchronization, in addition to
> scheduled synchronization. For instance, when a new item is added having it
> trigger a synchronization to push the new data to the other node.
> 
> Rsync is quite speedy. :-)

Well, whether your storage backend is a single point of failure depends
largely on its architecture. If you use dual pathing, dual active-active
controllers, etc, and some reasonable RAID level I would not at all
consider it to be a single point of failure.

If you still favor the idea of two separate storage systems, I think you
are heading down the road of bi-directional, real-time replication in
order to really do it right. I am of the opinion that most any system
reliant on crawling across large filesystems on a regular basis is
unacceptable at a large scale. I have also seen rsync require huge
amounts of memory at large scale. Lastly, the bidirectionality is also
an issue that could be complicated in particular if you allow objects to
be removed from your repository (consider whether you would use the
--delete flag or not, and how a new submission looks to one system like
a deletion to the other).

That said, if you rig up something to trigger a push to the other site,
you'll probably be able to get it to work...but it's really work that
could be achieved at the file system layer.
 
> > Make sure you run the indexer on only one instance.
> 
> Good to know!
>  
> > I run two regular handle servers redundantly, not against DSpace, but
> > against MySQL with bidirectional MySQL replication. The folks at CNRI
> > helped me work through the issues involved, which mainly involved having
> > a shared private key between the two and making sure that the two
> > servers were configured as masters so they did not try to use handle
> > replication. I would think that redundant handle servers operating
> > against DSpace (that is, DSpace methods for Postgres or MySQL access)
> > would be about the same thing--just making sure that the handle server
> > configurations are identical on each server.
> 
> What is the benefit to using the handle server with MySQL? What needs to be
> done to Dspace to get it to use the MySQL data rather than using the Dspace
> methods?

It won't apply here. To resolve handles in DSpace, you have to configure
the handle server to run against the DSpace metadata store through Java
methods.

My point with that was simply to say that handle servers can run in an
active-active load-balancing mode, but they need to both believe they
are masters and they need to use the same private key.

c

> > On Wed, 2007-03-07 at 11:52 -0800, Ryan Ordway wrote:
> >> I have been digging around to find information about sites using load
> >> balancing and/or clustering with their Dspace installations. All I could
> >> find was mention of load balancing web requests to multiple Tomcat 
> >> instances
> >> using mod_jk.
> >> 
> >> First some background, and then my question:
> >> 
> >> What I am looking to do is put my Dspace web servers behind my load 
> >> balancer
> >> to balance the HTTP requests. The web servers then both load balance their
> >> Tomcat connections via mod_jk to each other, with their own instances being
> >> weighted heavier so that they will prefer localhost.
> >> 
> >> For the database, for now I'm just using a single Postgres instance. I'm
> >> hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
> >> cluster. 
> >> 
> >> My question is, are there any issues to watch for? Will just rsync'ing the
> >> assetstore between the two web/app servers suffice? Are there any issues
> >> with running multiple handle servers?
> 
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to