[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845712#action_12845712 ]
Michael Busch commented on LUCENE-2312: --------------------------------------- {quote} Hmm... what does JMM say about byte arrays? If one thread is writing to the byte array, can any other thread see those changes? {quote} This is the very right question to ask here. Thread-safety is really the by far most complicated aspect of this feature. Jason, I'm not sure if you already figured out how to ensure visibility of changes made by the writer thread to the reader threads? Thread-safety in our case boils down to safe publication. We don't need locking to coordinate writing of multiple threads, because of LUCENE-2324. But we need to make sure that the reader threads see all changes they need to see at the right time, in the right order. This is IMO very hard, but we all like challenges :) The JMM gives no guarantee whatsover what changes a thread will see that another thread made - or if it will ever see the changes, unless proper publication is ensured by either synchronization or volatile/atomic variables. So e.g. if a writer thread executes the following statements: {code} public static int a, b; ... a = 1; b = 2; a = 5; b = 6; {code} and a reader threads does: {code} System.out.println(a + "," + b); {code} The thing to remember is that the output might be: 1,6! Another reader thread with the following code: {code} while (b != 6) { .. do something } {code} might further NEVER terminate without synchronization/volatile/atomic. The reason is that the JVM is allowed to perform any reorderings to utilize modern CPUs, memory, caches, etc. if not forced otherwise. To ensure safe publication of data written by a thread we could do synchronization, but my goal is it here to implement a non-blocking and lock-free algorithm. So my idea was it to make use of a very subtle behavior of volatile variables. I will take a simple explanation of the JMM from Brian Goetz' awesome book "Java concurrency in practice", in which he describes the JMM in simple happens-before rules. I will mention only three of those rules, because they are enough to describe the volatile behavior I'd like to mention here (p. 341) *Program order rule:* Each action in a thread _happens-before_ every action in that thread that comes later in the program order. *Volatile variable rule:* A write to a volatile field _happens-before_ every subsequent read of that same field. *Transitivity:* If A happens-before B, and B _happens-before_ C, then A _happens-before_ C. Based on these three rules you can see that writing to a volatile variable v by one thread t1 and subsequent reading of the same volatile variable v by another thread t2 publishes ALL changes of t1 that happened-before the write to v and the change of v itself. So this write/read of v means crossing a memory barrier and forcing everything that t1 might have written to caches to be flushed to the RAM. That's why a volatile write can actually be pretty expensive. Note that this behavior is actually only working like I just described since Java 1.5. Behavior of volatile variables was a very very subtle change from 1.4->1.5! The way I'm trying to make use of this behavior is actually similar to how we lazily sync Lucene's files with the filesystem: I want to delay the cache->RAM write-through as much as possible, which increases the probability of getting the sync for free! Still fleshing out the details, but I wanted to share these infos with you guys already, because it might invalidate a lot of assumptions you might have when developing the code. Some of this stuff was actually new to me, maybe you all know it already. And if anything that I wrote here is incorrect, please let me know! Btw: IMO, if there's only one java book you can ever read, then read Goetz' book! It's great. He also says in the book somewhere about lock-free algorithms: "Don't try this at home!" - so, let's do it! :) > Search on IndexWriter's RAM Buffer > ---------------------------------- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 3.0.1 > Reporter: Jason Rutherglen > Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org