Mike, >> "But, I believe you (system locks up with MMapDirectory for you use-case), so there is a bug somewhere! And I wish we could get to the bottom of that, and fix it."
Yes that's true for Windows for sure. I haven't tested it on Unix-like systems to that scale, so don't have any observations to report there. >> "Also, this (system locks up when using MMapDirectory) sounds different from the "Lucene fsyncs files that it doesn't need to" bug, right?" That's correct, they are separate issues. I just brought up the system-freezing-up-on-Windows point in response to Uwe's explanation earlier. I know I had taken it upon myself to open up a Jira for the fsync issue, but it got delayed from my side as I got occupied with other things in my day job. Will open up one later today. Thanks, Rahul On Wed, Mar 24, 2021 at 12:58 PM Michael McCandless < [email protected]> wrote: > MMapDirectory really should be (is supposed to be) better than > SimpleFSDirectory for your usage case. > > Memory mapped pages do not have to fit into your 64 GB physical space, but > the "hot" pages (parts of the index that you are actively querying) ideally > would fit mostly in free RAM on your box to have OK search performance. > Run with as small a JVM heap as possible so the OS has the most RAM to keep > such pages hot. Since you are getting OK performance with > SimpleFSDirectory it sounds like you do have enough free RAM for the parts > of the index you are searching... > > But, I believe you (system locks up with MMapDirectory for you use-case), > so there is a bug somewhere! And I wish we could get to the bottom of > that, and fix it. > > Also, this (system locks up when using MMapDirectory) sounds different > from the "Lucene fsyncs files that it doesn't need to" bug, right? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Mar 15, 2021 at 4:28 PM Rahul Goswami <[email protected]> > wrote: > >> Uwe, >> I understand that mmap would only map *a part* of the index from virtual >> address space to physical memory as and when the pages are requested. >> However the limitation on our side is that in most cases, we cannot ask for >> more than 128 GB RAM (and unfortunately even that would be a stretch) for >> the Solr machine. >> >> I have read and re-read the article you referenced in the past :) It's >> brilliantly written and did help clarify quite a few things for me I must >> say. However, at the end of the day, there is only so much the OS (at least >> Windows) can do before it starts to swap different pages in a 2-3 TB index >> into 64 GB of physical space, isn't that right ? The CPU usage spikes to >> 100% at such times and the machine becomes totally unresponsive. Turning on >> SimpleFSDIrectory at such times does rid us of this issue. I understand >> that we are losing out on performance by an order of magnitude compared to >> mmap, but I don't know any alternate solution. Also, since most of our use >> cases are more write-heavy than read-heavy, we can afford to compromise on >> the search performance due to SimpleFS. >> >> Please let me know still, if there is anything about my explanation that >> doesn't sound right to you. >> >> Thanks, >> Rahul >> >> On Mon, Mar 15, 2021 at 3:54 PM Uwe Schindler <[email protected]> wrote: >> >>> This is not true. Memory mapping does not need to load the index into >>> ram, so you don't need so much physical memory. Paging is done only between >>> index files and ram, that's what memory mapping is about. >>> >>> Please read the blog post: >>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >>> >>> Uwe >>> >>> Am March 15, 2021 7:43:29 PM UTC schrieb Rahul Goswami < >>> [email protected]>: >>>> >>>> Mike, >>>> Yes I am using a 64 bit JVM on Windows. I haven't tried reproducing the >>>> issue on Linux yet. In the past we have had problems with mmap on Windows >>>> with the machine freezing. The rationale I gave to myself is the amount of >>>> disk and CPU activity for paging in and out must be intense for the OS >>>> while trying to map an index that large into 64 GB of heap. Also since it's >>>> an on-premise deployment, we can't expect the customers of the product to >>>> provide nodes with > 400 GB RAM which is what *I think* would be required >>>> to get a decent performance with mmap. Hence we had to switch to >>>> SimpleFSDirectory. >>>> >>>> As for the fsync behavior, you are right. I tried with >>>> NRTCachingDirectoryFactory as well which defaults to using mmap underneath >>>> and still makes fsync calls for already existing index files. >>>> >>>> Thanks, >>>> Rahul >>>> >>>> On Mon, Mar 15, 2021 at 3:15 PM Michael McCandless < >>>> [email protected]> wrote: >>>> >>>>> Thanks Rahul. >>>>> >>>>> > primary reason being that memory mapping multi-terabyte indexes is >>>>> not feasible through mmap >>>>> >>>>> Hmm, that is interesting -- are you using a 64 bit JVM? If so, what >>>>> goes wrong with such large maps? Lucene's MMapDirectory should chunk the >>>>> mapping to deal with ByteBuffer int only address space. >>>>> >>>>> SimpleFSDirectory usually has substantially worse performance than >>>>> MMapDirectory. >>>>> >>>>> Still, I suspect you would hit the same issue if you used other >>>>> FSDirectory implementations -- the fsync behavior should be the same. >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> >>>>> On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks Michael. For your question...yes I am running Solr on Windows >>>>>> and running it with SimpleFSDirectoryFactory (primary reason being that >>>>>> memory mapping multi-terabyte indexes is not feasible through mmap). I >>>>>> will >>>>>> create a Jira later today with the details in this thread and assign it >>>>>> to >>>>>> myself. Will take a shot at the fix. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I think long ago we used to track which files were actually dirty >>>>>>> (we had written bytes to) and only fsync those ones. But something went >>>>>>> wrong with that, and at some point we "simplified" this logic, I think >>>>>>> on >>>>>>> the assumption that asking the OS to fsync a file that does in fact >>>>>>> exist >>>>>>> yet indeed has not changed would be harmless? But somehow it is not in >>>>>>> your case? Are you on Windows? >>>>>>> >>>>>>> I tried to do a bit of digital archaeology and remember what >>>>>>> happened here, and I came across this relevant looking issue: >>>>>>> https://issues.apache.org/jira/browse/LUCENE-2328. That issue >>>>>>> moved tracking of which files have been written but not yet fsync'd down >>>>>>> from IndexWriter into FSDirectory. >>>>>>> >>>>>>> But there was another change that then removed staleFiles from >>>>>>> FSDirectory entirely.... still trying to find that. Aha, found it! >>>>>>> https://issues.apache.org/jira/browse/LUCENE-6150. Phew Uwe was >>>>>>> really quite upset in that issue ;) >>>>>>> >>>>>>> I also came across this delightful related issue, showing how a >>>>>>> massive hurricane (Irene) can lead to finding and fixing a bug in >>>>>>> Lucene! >>>>>>> https://issues.apache.org/jira/browse/LUCENE-3418 >>>>>>> >>>>>>> > The assumption is that while the commit point is saved, no changes >>>>>>> happen to the segment files in the saved generation. >>>>>>> >>>>>>> This assumption should really be true. Lucene writes the files, >>>>>>> append only, once, and then never changes them, once they are closed. >>>>>>> Pulling a commit point from Solr should further ensure that, even as >>>>>>> indexing continues and new segments are written, the old segments >>>>>>> referenced in that commit point will not be deleted. But apparently >>>>>>> this >>>>>>> "harmless fsync" Lucene is doing is not so harmless in your use case. >>>>>>> Maybe open an issue and pull out the details from this discussion onto >>>>>>> it? >>>>>>> >>>>>>> Mike McCandless >>>>>>> >>>>>>> http://blog.mikemccandless.com >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Also - I should have said - I think the first step here is to write >>>>>>>> a >>>>>>>> focused unit test that demonstrates the existence of the extra >>>>>>>> fsyncs >>>>>>>> that we want to eliminate. It would be awesome if you were able to >>>>>>>> create such a thing. >>>>>>>> >>>>>>>> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov <[email protected]> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Yes, please go ahead and open an issue. TBH I'm not sure why this >>>>>>>> is >>>>>>>> > happening - there may be a good reason?? But let's explore it >>>>>>>> using an >>>>>>>> > issue, thanks. >>>>>>>> > >>>>>>>> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami < >>>>>>>> [email protected]> wrote: >>>>>>>> > > >>>>>>>> > > I can create a Jira and assign it to myself if that's ok (?). I >>>>>>>> think this can help improve commit performance. >>>>>>>> > > Also, to answer your question, we have indexes sometimes going >>>>>>>> into multiple terabytes. Using the replication handler for backup would >>>>>>>> mean requiring a disk capacity more than 2x the index size on the >>>>>>>> machine >>>>>>>> at all times, which might not be feasible. So we directly back the >>>>>>>> index up >>>>>>>> from the Solr node to a remote repository. >>>>>>>> > > >>>>>>>> > > Thanks, >>>>>>>> > > Rahul >>>>>>>> > > >>>>>>>> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> >>>>>>>> > >> Well, it certainly doesn't seem necessary to fsync files that >>>>>>>> are >>>>>>>> > >> unchanged and have already been fsync'ed. Maybe there's an >>>>>>>> opportunity >>>>>>>> > >> to improve it? On the other hand, support for external >>>>>>>> processes >>>>>>>> > >> reading Lucene index files isn't likely to become a feature of >>>>>>>> Lucene. >>>>>>>> > >> You might want to consider using Solr replication to power your >>>>>>>> > >> backup? >>>>>>>> > >> >>>>>>>> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> > >>>>>>>> > >> > Thanks Michael. I thought since this discussion is closer to >>>>>>>> the code than most discussions on the solr-users list, it seemed like a >>>>>>>> more appropriate forum. Will be mindful going forward. >>>>>>>> > >> > On your point about new segments, I attached a debugger and >>>>>>>> tried to do a new commit (just pure Solr commit, no backup process >>>>>>>> running), and the code indeed does fsync on a pre-existing segment >>>>>>>> file. >>>>>>>> Hence I was a bit baffled since it challenged my fundamental >>>>>>>> understanding >>>>>>>> that segment files once written are immutable, no matter what (unless >>>>>>>> picked up for a merge of course). Hence I thought of reaching out, in >>>>>>>> case >>>>>>>> there are scenarios where this might happen which I might be unaware >>>>>>>> of. >>>>>>>> > >> > >>>>>>>> > >> > Thanks, >>>>>>>> > >> > Rahul >>>>>>>> > >> > >>>>>>>> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> >> >>>>>>>> > >> >> This isn't a support forum; solr-users@ might be more >>>>>>>> appropriate. On >>>>>>>> > >> >> that list someone might have a better idea about how the >>>>>>>> replication >>>>>>>> > >> >> handler gets its list of files. This would be a good list >>>>>>>> to try if >>>>>>>> > >> >> you wanted to propose a fix for the problem you're having. >>>>>>>> But since >>>>>>>> > >> >> you're here -- it looks to me as if IndexWriter indeed >>>>>>>> syncs all "new" >>>>>>>> > >> >> files in the current segments being committed; look in >>>>>>>> > >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) >>>>>>>> I'm >>>>>>>> > >> >> looking at this code for the first time, and (2) things may >>>>>>>> have been >>>>>>>> > >> >> different in 7.7.2? Sorry I don't know for sure, but are >>>>>>>> you sure that >>>>>>>> > >> >> your backup process is not attempting to copy one of the >>>>>>>> new files? >>>>>>>> > >> >> >>>>>>>> > >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> >> > >>>>>>>> > >> >> > Hello, >>>>>>>> > >> >> > Just wanted to follow up one more time to see if this is >>>>>>>> the right form for my question? Or is this suitable for some other >>>>>>>> mailing >>>>>>>> list? >>>>>>>> > >> >> > >>>>>>>> > >> >> > Best, >>>>>>>> > >> >> > Rahul >>>>>>>> > >> >> > >>>>>>>> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Hello everyone, >>>>>>>> > >> >> >> Following up on my question in case anyone has any idea. >>>>>>>> Why it's important to know this is because I am thinking of allowing >>>>>>>> the >>>>>>>> backup process to not hold any lock on the index files, which should >>>>>>>> allow >>>>>>>> the fsync during parallel commits. BUT, in case doing an fsync on >>>>>>>> existing >>>>>>>> segment files in a saved commit point DOES have an effect, it might >>>>>>>> render >>>>>>>> the backed up index in a corrupt state. >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Thanks, >>>>>>>> > >> >> >> Rahul >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami < >>>>>>>> [email protected]> wrote: >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Hello, >>>>>>>> > >> >> >>> We have a process which backs up the index (Solr 7.7.2) >>>>>>>> on a schedule. The way we do it is we first save a commit point on the >>>>>>>> index and then using Solr's /replication handler, get the list of >>>>>>>> files in >>>>>>>> that generation. After the backup completes, we release the commit >>>>>>>> point >>>>>>>> (Please note that this is a separate backup process outside of Solr >>>>>>>> and not >>>>>>>> the backup command of the /replication handler) >>>>>>>> > >> >> >>> The assumption is that while the commit point is saved, >>>>>>>> no changes happen to the segment files in the saved generation. >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Now the issue... The backup process opens the index >>>>>>>> files in a shared READ mode, preventing writes. This is causing any >>>>>>>> parallel commits to fail as it seems to be complaining about the index >>>>>>>> files to be locked by another process(the backup process). Upon >>>>>>>> debugging, >>>>>>>> I see that fsync is being called during commit on already existing >>>>>>>> segment >>>>>>>> files which is not expected. So, my question is, is there any reason >>>>>>>> for >>>>>>>> lucene to call fsync on already existing segment files? >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> The line of code I am referring to is as below: >>>>>>>> > >> >> >>> try (final FileChannel file = >>>>>>>> FileChannel.open(fileToSync, isDir ? StandardOpenOption.READ : >>>>>>>> StandardOpenOption.WRITE)) >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the >>>>>>>> class file >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Thanks, >>>>>>>> > >> >> >>> Rahul >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >> >> To unsubscribe, e-mail: [email protected] >>>>>>>> > >> >> For additional commands, e-mail: [email protected] >>>>>>>> > >> >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >> To unsubscribe, e-mail: [email protected] >>>>>>>> > >> For additional commands, e-mail: [email protected] >>>>>>>> > >> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>> >>>>>>>> >>> -- >>> Uwe Schindler >>> Achterdiek 19, 28357 Bremen >>> https://www.thetaphi.de >>> >>
