Thank you very much, I'll try it! On Thu, Sep 29, 2016 at 11:22 PM, Michael McCandless < luc...@mikemccandless.com> wrote:
> No ... I don't think Luke can recreate the segments file. > > I dug around and found the thread I was thinking of: > http://markmail.org/thread/ayl5q6rgtngeuoyy > > Just be careful! Make a backup copy of your index first! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Sep 29, 2016 at 11:31 AM, Ziming Dong <dzm1016397...@gmail.com> > wrote: > > do you mean `http://www.getopt.org/luke/`? > > > > On Mon, Sep 26, 2016 at 4:58 AM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> It is in theory possible to reconstruct a segments file by ls-ing all > >> other index files and manually rebuilding it but it is not an easy > >> task and it would have to make some guesses. > >> > >> I think in the past a user did manage to create such a tool and maybe > >> posted the results here either on this list or the dev list? > >> > >> The segments file is a vital file to the index. It holds all metadata > >> about the index segments. This is why Lucene is so careful about > >> writing a new one to a "pending" file, fsync'ing that, fsyncing the > >> directory, and doing an atomic rename, all before removing the older > >> segment files. > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Sun, Sep 25, 2016 at 10:37 AM, Ziming Dong <dzm1016397...@gmail.com> > >> wrote: > >> > sorry to resend. > >> > I'll change IO to local. Is there anyway to recover first index? now > it > >> > can > >> > not be opened by checkIndex, we are building index of 7 billion > >> > webpages, it > >> > costs much time to rebuild. > >> > > >> > On Sun, Sep 25, 2016 at 5:31 PM, Ziming Dong <dzm1016397...@gmail.com > > > >> > wrote: > >> >> > >> >> I'll change IO to local. Is there anyway to recover first index? now > it > >> >> can be opened by checkIndex, we are building index of 7 billion > >> >> webpages, it > >> >> costs much time to rebuild. > >> >> > >> >> On Sat, Sep 24, 2016 at 2:54 AM, Michael McCandless > >> >> <luc...@mikemccandless.com> wrote: > >> >>> > >> >>> The 'sync' option for an NFS client just means that every write is > >> >>> sent immediately across the network. And it really is useless > >> >>> performance loss as long as your app (like Lucene) does the "right > >> >>> thing" with fsync. > >> >>> > >> >>> The more important question is why fsync sent to your NFS client and > >> >>> then to the Mac Mini's NFS server failed to actually move all > written > >> >>> bytes to durable storage. > >> >>> > >> >>> Can you reproduce this issue if you use a more well trodden IO > system, > >> >>> e.g. Linux with ext4 on a local IO device? > >> >>> > >> >>> Mike McCandless > >> >>> > >> >>> http://blog.mikemccandless.com > >> >>> > >> >>> On Fri, Sep 23, 2016 at 12:00 AM, Ziming Dong > >> >>> <dzm1016397...@gmail.com> > >> >>> wrote: > >> >>> > I use the macmini on NFS server side. It seems mount option sync > is > >> >>> > useless, just slows down the index program. > >> >>> > > >> >>> > On Fri, Sep 23, 2016 at 4:43 AM, Michael McCandless > >> >>> > <luc...@mikemccandless.com> wrote: > >> >>> >> > >> >>> >> OK sorry I meant your first index, and it seems to have only one > >> >>> >> (broken) segments file. Can you post the "ls -l" output of that > >> >>> >> first > >> >>> >> index? It looks like the file was (illegally) filled with 0s, or > >> >>> >> at > >> >>> >> least the first 4 bytes were. > >> >>> >> > >> >>> >> Lucene writes this file, fsyncs it, does an atomic rename, and > >> >>> >> fsyncs > >> >>> >> the directory, so this should not happen, if your IO system > honors > >> >>> >> fsync. > >> >>> >> > >> >>> >> What IO devices are used by the NFS server? > >> >>> >> > >> >>> >> NFS is not well tested and has several known problems with Lucene > >> >>> >> so > >> >>> >> this is already risky ground... > >> >>> >> > >> >>> >> Mike McCandless > >> >>> >> > >> >>> >> http://blog.mikemccandless.com > >> >>> >> > >> >>> >> On Thu, Sep 22, 2016 at 11:33 AM, Ziming Dong > >> >>> >> <dzm1016397...@gmail.com> > >> >>> >> wrote: > >> >>> >> > second index is recovered by checkIndex, I don't know what are > in > >> >>> >> > second > >> >>> >> > index directory before recover. > >> >>> >> > checkIndex can't read first index. index filenames are > attached. > >> >>> >> > I use lucene6.0.0 at the beginning, then I upgrade to > lucene6.1.0 > >> >>> >> > to > >> >>> >> > continue index. > >> >>> >> > > >> >>> >> > On Thu, Sep 22, 2016 at 10:17 PM, Michael McCandless > >> >>> >> > <luc...@mikemccandless.com> wrote: > >> >>> >> >> > >> >>> >> >> Do you have 2 separate segments files in that 2nd index? > >> >>> >> >> > >> >>> >> >> Which exact Lucene version is this? > >> >>> >> >> > >> >>> >> >> Mike McCandless > >> >>> >> >> > >> >>> >> >> http://blog.mikemccandless.com > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> On Thu, Sep 22, 2016 at 7:44 AM, Ziming Dong > >> >>> >> >> <dzm1016397...@gmail.com> > >> >>> >> >> wrote: > >> >>> >> >> > I used checkIndex to recover second index though I lost many > >> >>> >> >> > docs > >> >>> >> >> > in > >> >>> >> >> > index, > >> >>> >> >> > but first index can't be read by checkIndex, error is > >> >>> >> >> > > >> >>> >> >> >> java -cp lucene-core-6.1.0.jar -ea:org.apache.lucene... > >> >>> >> >> >> org.apache.lucene.index.CheckIndex > >> >>> >> >> >> /Volumes/HPT8_56T/infomall-index/index0 > >> >>> >> >> >> Opening index @ /Volumes/HPT8_56T/infomall-index/index0 > >> >>> >> >> >> ERROR: could not read any segments file in directory > >> >>> >> >> >> org.apache.lucene.index.IndexFormatTooOldException: Format > >> >>> >> >> >> version > >> >>> >> >> >> is > >> >>> >> >> >> not > >> >>> >> >> >> supported (resource > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> BufferedChecksumIndexInput(MMapIndexInput(path="/Volumes/ > HPT8_56T/infomall-index/index0/segments_5t3"))): > >> >>> >> >> >> 0 (needs to be between 1071082519 and 1071082519). This > >> >>> >> >> >> version > >> >>> >> >> >> of > >> >>> >> >> >> Lucene > >> >>> >> >> >> only supports indexes created with release 5.0 and later. > >> >>> >> >> >> at > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit( > SegmentInfos.java:295) > >> >>> >> >> >> at > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit( > SegmentInfos.java:284) > >> >>> >> >> >> at > >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> org.apache.lucene.index.CheckIndex.checkIndex( > CheckIndex.java:507) > >> >>> >> >> >> at > >> >>> >> >> >> > >> >>> >> >> >> org.apache.lucene.index.CheckIndex.doCheck(CheckIndex. > java:2595) > >> >>> >> >> >> at > >> >>> >> >> >> > >> >>> >> >> >> org.apache.lucene.index.CheckIndex.doMain(CheckIndex. > java:2497) > >> >>> >> >> >> at > >> >>> >> >> >> org.apache.lucene.index.CheckIndex.main(CheckIndex. > java:2423) > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > I use NFS, but I set mount option as mount -t nfs -o > >> >>> >> >> > tcp,sync,retrans=10 > >> >>> >> >> > The index program has run 1 month without any problem before > >> >>> >> >> > power > >> >>> >> >> > failure. > >> >>> >> >> > > >> >>> >> >> > On Thu, Sep 22, 2016 at 6:06 PM, Michael McCandless > >> >>> >> >> > <luc...@mikemccandless.com> wrote: > >> >>> >> >> >> > >> >>> >> >> >> Hmm I'm no longer so sure this is an IW bug: on commit we > >> >>> >> >> >> fsync > >> >>> >> >> >> the > >> >>> >> >> >> pending_segments_N and then do an atomic rename to > >> >>> >> >> >> segments_N. > >> >>> >> >> >> > >> >>> >> >> >> Can you describe your IO system? Is it possible it does > not > >> >>> >> >> >> implement > >> >>> >> >> >> fsync or atomic renames correctly? > >> >>> >> >> >> > >> >>> >> >> >> Also, your 2nd exception indices the segments_N file was > >> >>> >> >> >> intact > >> >>> >> >> >> but > >> >>> >> >> >> the .cfs file was corrupt, which is also hard to explain > >> >>> >> >> >> unless > >> >>> >> >> >> fsync > >> >>> >> >> >> isn't working on your IO system. > >> >>> >> >> >> > >> >>> >> >> >> Mike McCandless > >> >>> >> >> >> > >> >>> >> >> >> http://blog.mikemccandless.com > >> >>> >> >> >> > >> >>> >> >> >> On Thu, Sep 22, 2016 at 5:10 AM, Michael McCandless > >> >>> >> >> >> <luc...@mikemccandless.com> wrote: > >> >>> >> >> >> > Sorry for the slow reply here. Curious that both of > these > >> >>> >> >> >> > exceptions > >> >>> >> >> >> > are from IW.init. I think this may be a real bug, caused > >> >>> >> >> >> > by > >> >>> >> >> >> > this: > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > https://github.com/apache/lucene-solr/commit/ > 981bfba841144d08df1d1a183d39fcd6f195ad56 > >> >>> >> >> >> > > >> >>> >> >> >> > I'll see if I can make a standalone test case showing > this. > >> >>> >> >> >> > > >> >>> >> >> >> > If you open those indices with an IndexReader instead, > does > >> >>> >> >> >> > it > >> >>> >> >> >> > succeed? > >> >>> >> >> >> > > >> >>> >> >> >> > If you run CheckIndex, what does it report? > >> >>> >> >> >> > > >> >>> >> >> >> > Mike McCandless > >> >>> >> >> >> > > >> >>> >> >> >> > http://blog.mikemccandless.com > >> >>> >> >> >> > > >> >>> >> >> >> > On Wed, Sep 14, 2016 at 1:22 AM, Ziming Dong > >> >>> >> >> >> > <dzm1016397...@gmail.com> > >> >>> >> >> >> > wrote: > >> >>> >> >> >> >> I have 6 machine and 6 index directories, each machine > >> >>> >> >> >> >> builds > >> >>> >> >> >> >> index > >> >>> >> >> >> >> into > >> >>> >> >> >> >> one index directory. After power failure last night, two > >> >>> >> >> >> >> of > >> >>> >> >> >> >> those > >> >>> >> >> >> >> machine > >> >>> >> >> >> >> can't start index program. > >> >>> >> >> >> >> > >> >>> >> >> >> >> one error is > >> >>> >> >> >> >> > >> >>> >> >> >> >>> INFO: 2016-09-14 12:31:38 [main] > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder: > ignoreCollectionsFile(227): > >> >>> >> >> >> >>> Loaded 2146 ignored collections from > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> /mnt/HPT8_56T/infomall-index/ > index0/ignored_collections.txt > >> >>> >> >> >> >>> ERROR: 2016-09-14 12:31:39 [main] > >> >>> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71): > >> >>> >> >> >> >>> org.apache.lucene.index.IndexFormatTooOldException: > >> >>> >> >> >> >>> Format > >> >>> >> >> >> >>> version > >> >>> >> >> >> >>> is > >> >>> >> >> >> >>> not > >> >>> >> >> >> >>> supported (resource > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/ > HPT8_56T/infomall-index/index0/segments_5t3"))): > >> >>> >> >> >> >>> 0 (needs to be between 1071082519 and 1071082519). This > >> >>> >> >> >> >>> version > >> >>> >> >> >> >>> of > >> >>> >> >> >> >>> Lucene > >> >>> >> >> >> >>> only supports indexes created with release 5.0 and > later. > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit( > SegmentInfos.java:295) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit( > SegmentInfos.java:284) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>( > IndexWriter.java:910) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>( > InfomallIndexer.java:60) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init> > (ThreadedInfomallIndexer.java:28) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init> > (ThreadedInfomallIndexer.java:21) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$ > Builder.build(ThreadedInfomallIndexer.java:72) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main( > ThreadedInfomallIndexer.java:129) > >> >>> >> >> >> >> > >> >>> >> >> >> >> > >> >>> >> >> >> >> another is > >> >>> >> >> >> >> > >> >>> >> >> >> >> INFO: 2016-09-14 01:11:06 [main] > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder: > ignoreCollectionsFile(227): > >> >>> >> >> >> >>> Loaded 8575 ignored collections from > >> >>> >> >> >> >>> /mnt/HPT8/infomall-index/ > index5/ignored_collections.txt > >> >>> >> >> >> >>> ERROR: 2016-09-14 01:11:09 [main] > >> >>> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71): > >> >>> >> >> >> >>> org.apache.lucene.index.CorruptIndexException: codec > >> >>> >> >> >> >>> footer > >> >>> >> >> >> >>> mismatch > >> >>> >> >> >> >>> (file > >> >>> >> >> >> >>> truncated?): actual footer=0 vs expected > >> >>> >> >> >> >>> footer=-1071082520 > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> (resource=MMapIndexInput(path= > "/mnt/HPT8/infomall-index/index5/_1kqn.cfs")) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.validateFooter( > CodecUtil.java:448) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.retrieveChecksum( > CodecUtil.java:433) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.codecs.lucene50. > Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:86) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.codecs.lucene50. > Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:71) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.IndexWriter.readFieldInfos( > IndexWriter.java:1016) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.IndexWriter.getFieldNumberMap( > IndexWriter.java:1033) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>( > IndexWriter.java:938) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>( > InfomallIndexer.java:60) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init> > (ThreadedInfomallIndexer.java:28) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init> > (ThreadedInfomallIndexer.java:21) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$ > Builder.build(ThreadedInfomallIndexer.java:72) > >> >>> >> >> >> >>> at > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main( > ThreadedInfomallIndexer.java:129) > >> >>> >> >> >> >>> > >> >>> >> >> >> >> > >> >>> >> >> >> >> > >> >>> >> >> >> >> it seems 1071082519 is a special number. > >> >>> >> >> >> >> > >> >>> >> >> >> >> - - > >> >>> >> >> >> >> > >> >>> >> >> >> >> Ziming Dong > >> >>> >> >> >> >> *http://suiyuan2009.github.io/ > >> >>> >> >> >> >> <http://suiyuan2009.github.io/>* > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > > >> >>> >> >> > -- > >> >>> >> >> > > >> >>> >> >> > Ziming Dong > >> >>> >> >> > http://suiyuan2009.github.io/ > >> >>> >> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > -- > >> >>> >> > > >> >>> >> > Ziming Dong > >> >>> >> > http://suiyuan2009.github.io/ > >> >>> >> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > > >> >>> > Ziming Dong > >> >>> > http://suiyuan2009.github.io/ > >> >>> > > >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Ziming Dong > >> >> http://suiyuan2009.github.io/ > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > Ziming Dong > >> > http://suiyuan2009.github.io/ > >> > > > > > > > > > > > -- > > > > Ziming Dong > > http://suiyuan2009.github.io/ > > > -- Ziming Dong *http://suiyuan2009.github.io/ <http://suiyuan2009.github.io/>*