hey folks, we already have transaction logging on Solr side so I should have started this discussion earlier. However, I want to bring this up to the list since I think this is a very valuable feature also for plain Lucene users and eventually this should also be available to them. I don't think this needs to be a core feature at all but I think we need to provide the necessary hooks in Lucene core to make this reliable and consistent. I have a couple of concerns that which the current extension mechanism we provide on the IndexWriter side this feature can only be implemented in a sub-optimal way on the Solr (or basically on top of lucene) but lemme elaborate this a little.
IndexWriter doesn't provide any transaction guarantees neither does it give any guarantees on the order. So if you index two versions of a document with the same delete key you can't tell which one wins unless you prevent IW from seeing those two documents at the same time ie. locking before you hit IW. This is basically what other implementation do like ElasticSearch which uses locks assigned to buckets in an array selected based on the del terms hash. However this gets a little more complex once you get to DeleteQueries where you can't tell which document is affected so they might be misplaced in the transaction log if the order doesn't match the order the IW sees. Under the hood IW does maintain such an order inside the DocumentsWriterDeleteQueue which could be utilized to provide a total ordering that IMO should be reflected in the transaction log. Before I am going to propose ways of how this could be implemented I want to check if other think we should provide more reliable ways for users with the need for durability and consistent recovery. simon --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org