I can create a Jira and assign it to myself if that's ok (?). I think this
can help improve commit performance.
Also, to answer your question, we have indexes sometimes going into
multiple terabytes. Using the replication handler for backup would mean
requiring a disk capacity more than 2x the index size on the machine at all
times, which might not be feasible. So we directly back the index up from
the Solr node to a remote repository.

Thanks,
Rahul

On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <msoko...@gmail.com> wrote:

> Well, it certainly doesn't seem necessary to fsync files that are
> unchanged and have already been fsync'ed. Maybe there's an opportunity
> to improve it? On the other hand, support for external processes
> reading Lucene index files isn't likely to become a feature of Lucene.
> You might want to consider using Solr replication to power your
> backup?
>
> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >
> > Thanks Michael. I thought since this discussion is closer to the code
> than most discussions on the solr-users list, it seemed like a more
> appropriate forum. Will be mindful going forward.
> > On your point about new segments, I attached a debugger and tried to do
> a new commit (just pure Solr commit, no backup process running), and the
> code indeed does fsync on a pre-existing segment file. Hence I was a bit
> baffled since it challenged my fundamental understanding that segment files
> once written are immutable, no matter what (unless picked up for a merge of
> course). Hence I thought of reaching out, in case there are scenarios where
> this might happen which I might be unaware of.
> >
> > Thanks,
> > Rahul
> >
> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
> >>
> >> This isn't a support forum; solr-users@ might be more appropriate. On
> >> that list someone might have a better idea about how the replication
> >> handler gets its list of files. This would be a good list to try if
> >> you wanted to propose a fix for the problem you're having. But since
> >> you're here -- it looks to me as if IndexWriter indeed syncs all "new"
> >> files in the current segments being committed; look in
> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm
> >> looking at this code for the first time, and (2) things may have been
> >> different in 7.7.2? Sorry I don't know for sure, but are you sure that
> >> your backup process is not attempting to copy one of the new files?
> >>
> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >> >
> >> > Hello,
> >> > Just wanted to follow up one more time to see if this is the right
> form for my question? Or is this suitable for some other mailing list?
> >> >
> >> > Best,
> >> > Rahul
> >> >
> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >> >>
> >> >> Hello everyone,
> >> >> Following up on my question in case anyone has any idea. Why it's
> important to know this is because I am thinking of allowing the backup
> process to not hold any lock on the index files, which should allow the
> fsync during parallel commits. BUT, in case doing an fsync on existing
> segment files in a saved commit point DOES have an effect, it might render
> the backed up index in a corrupt state.
> >> >>
> >> >> Thanks,
> >> >> Rahul
> >> >>
> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >> >>>
> >> >>> Hello,
> >> >>> We have a process which backs up the index (Solr 7.7.2) on a
> schedule. The way we do it is we first save a commit point on the index and
> then using Solr's /replication handler, get the list of files in that
> generation. After the backup completes, we release the commit point (Please
> note that this is a separate backup process outside of Solr and not the
> backup command of the /replication handler)
> >> >>> The assumption is that while the commit point is saved, no changes
> happen to the segment files in the saved generation.
> >> >>>
> >> >>> Now the issue... The backup process opens the index files in a
> shared READ mode, preventing writes. This is causing any parallel commits
> to fail as it seems to be complaining about the index files to be locked by
> another process(the backup process). Upon debugging, I see that fsync is
> being called during commit on already existing segment files which is not
> expected. So, my question is, is there any reason for lucene to call fsync
> on already existing segment files?
> >> >>>
> >> >>> The line of code I am referring to is as below:
> >> >>> try (final FileChannel file = FileChannel.open(fileToSync, isDir ?
> StandardOpenOption.READ : StandardOpenOption.WRITE))
> >> >>>
> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class file
> >> >>>
> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java
> >> >>>
> >> >>> Thanks,
> >> >>> Rahul
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to