[GitHub] [lucenenet] Jeevananthan-23 commented on issue #763: Support async task

GitBox Sun, 01 Jan 2023 22:43:27 -0800


Jeevananthan-23 commented on issue #763:
URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1368691729


   > For the record, there are already existing Lucene directory 
implementations for various cloud platforms and products. Some would need to be 
ported to .NET, others might already exist on .NET (some of which don't yet 
support 4.8.0).
   > 
   > [#631 
(comment)](https://github.com/apache/lucenenet/issues/631#issuecomment-1086764962)
 https://github.com/azure-contrib/AzureDirectory 
https://github.com/tomlm/Lucene.Net.Store.Azure 
https://stackoverflow.com/a/59381272 
https://github.com/albogdano/lucene-s3directory 
https://docs.jboss.org/author/display/ISPN50/Infinispan%20as%20a%20Directory%20for%20Lucene.html
 https://cwiki.apache.org/confluence/display/lucene/AvailableLockFactories
   > 
   > Lucene is highly optimized to use a local file system. Swapping the 
directory implementation is one way to extend Lucene, but it will come at a 
pretty significant performance penalty to write to a blob storage provider (as 
you can see by the notes on each implementation). There are existing cloud 
products that emulate a local file system that may or may not do the job better 
of moving your index off of the server it runs on.
   > 
   > However, rolling your own `Directory` implementation is a major job that 
will take a significant amount of effort to perform well and be stable enough 
to use.
   > 
   > Note that there is a 
[Lucene.Net.Replicator](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/replicator/Lucene.Net.Replicator.html)
 module that is designed to synchronize an index across multiple servers by 
hosting a listener on a server and having each client request a copy 
periodically. I haven't tried, but I suspect there is a way to utilize it to 
safely copy the index to cloud storage at periodic intervals. I wouldn't 
recommend copying the index outside of a locking context, though, because 
Lucene.NET may lock the files at unpredictable times and it may not be 
practical to copy them without getting copy errors.
   
   So insightful conversion throughout this topic, I should also add my points 
here as @NightOwl888 mentioned it's great to implement 
**[nrt-replication](https://github.com/Yelp/nrtsearch#near-real-time-replication)**.
 **Yelp Engineers implemented this kind of project for their **NRT-Search 
engine** to fulfill the need for Large cluster management. But NRT-Replication 
seems a good alternative to document-based replication when it comes to costs 
associated with maintaining large clusters. Scaling document-based clusters 
up/down promptly could be slower due to data migration between nodes apart from 
paying the cost for reindexing on all nodes.**
   
   Hope it will help,
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [lucenenet] Jeevananthan-23 commented on issue #763: Support async task

Reply via email to