Jeevananthan-23 commented on issue #763: URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1368691729
> For the record, there are already existing Lucene directory implementations for various cloud platforms and products. Some would need to be ported to .NET, others might already exist on .NET (some of which don't yet support 4.8.0). > > [#631 (comment)](https://github.com/apache/lucenenet/issues/631#issuecomment-1086764962) https://github.com/azure-contrib/AzureDirectory https://github.com/tomlm/Lucene.Net.Store.Azure https://stackoverflow.com/a/59381272 https://github.com/albogdano/lucene-s3directory https://docs.jboss.org/author/display/ISPN50/Infinispan%20as%20a%20Directory%20for%20Lucene.html https://cwiki.apache.org/confluence/display/lucene/AvailableLockFactories > > Lucene is highly optimized to use a local file system. Swapping the directory implementation is one way to extend Lucene, but it will come at a pretty significant performance penalty to write to a blob storage provider (as you can see by the notes on each implementation). There are existing cloud products that emulate a local file system that may or may not do the job better of moving your index off of the server it runs on. > > However, rolling your own `Directory` implementation is a major job that will take a significant amount of effort to perform well and be stable enough to use. > > Note that there is a [Lucene.Net.Replicator](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/replicator/Lucene.Net.Replicator.html) module that is designed to synchronize an index across multiple servers by hosting a listener on a server and having each client request a copy periodically. I haven't tried, but I suspect there is a way to utilize it to safely copy the index to cloud storage at periodic intervals. I wouldn't recommend copying the index outside of a locking context, though, because Lucene.NET may lock the files at unpredictable times and it may not be practical to copy them without getting copy errors. So insightful conversion throughout this topic, I should also add my points here as @NightOwl888 mentioned it's great to implement **[nrt-replication](https://github.com/Yelp/nrtsearch#near-real-time-replication)**. **Yelp Engineers implemented this kind of project for their **NRT-Search engine** to fulfill the need for Large cluster management. But NRT-Replication seems a good alternative to document-based replication when it comes to costs associated with maintaining large clusters. Scaling document-based clusters up/down promptly could be slower due to data migration between nodes apart from paying the cost for reindexing on all nodes.** Hope it will help, Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org