I have built something similar using NTFS hard-links and re-using existing 
local snapshot files, etc.  It runs in production for 3+ years now with more 
than 100 million docs, and distributes new snapshots from master servers every 
minute.  It does not use any rsync, but only leverages unique file names in 
lucene - it only copies files not already existing on slaves, and uses NTFS 
hard links to "copy" existing local files into new snapshot directory. Also, on 
the masters, it just uses NTFS hard links to create a new "snapshot" of the 
master index, and then slaves just look for new snapshot directories on the 
master servers.  When new directory shows up, it looks at existing local 
snapshot to see which files are new on master (or have been deleted by master), 
and then only copies new files.  It does not need to send any explicit commit 
operations, and there is no explicit communication between masters and slaves 
(slaves just look in some remote directory for new snapshot sub-directories).   
This has worked great with no problems at all.  All this was built prior to 
SOLR being available on windows.  Going forward we are transitioning to Java 
and SOLR on Linux (it is just to hard to keep up with improvements otherwise 
IMO).



On Jul 6, 2011, at 8:22 PM, Guilherme Balena Versiani wrote:

> Hi,
> 
> I am working on a derived work of Solr for .NET. The purpose is to obtain a 
> similar solution of Lucene replication available at Solr, but without the 
> need to port all Solr code.
> 
> There is a SnapShooter, SnapPuller and a SnapInstaller. The SnapShooter does 
> similar work as in Solr script. The SnapPuller uses cwRsync to replicate the 
> database between machines, but without storing the 
> snapshot.current.MACHINENAME files on master, as cwRsync does no support sync 
> with the server. The SnapInstaller tries to substitute the Lucene database 
> files "in-place" -- the Lucene application should use a "SteroidsFSDirectory" 
> that creates a special "SteroidsFSIndexInput" that permits to rename files in 
> use; after that, SnapInstaller sends a "commit" operation through a Windows 
> named pipe to the application to reset its current IndexSearcher instance.
> 
> This solution has the "suggestive" name of Lucene Steroids, and was hosted in 
> BitBucket.org. What is the best way to continue to distribute it? Should I 
> continue to maintain it on BitBucket.org or should I apply to Lucene.NET 
> project (I don't know how) to include it on Contrib modules?
> 
> The current code is available at http://bitbucket.org/guibv/lucene.steroids. 
> The work is incomplete; the first stable version should be available on next 
> few days.
> 
> Best regards,
> Guilherme Balena Versiani.

Reply via email to