RE: Restoring a corrupt index

Honey George Thu, 19 Aug 2004 03:37:20 -0700

If I understand correctly, You have situation where
you have a large main index and then you create small
indexes and finally merge to the main index. It can
happen that half way through merging, the system
crashed and the index got corrupted. I do not think in
this case you can use my solution.


What I am trying to do is to remove a corrupt segment
and associated files from the index folder, not trying
to fix a corrupt segment. This way atleast I can add
new documents to the index. Of cource I am sure I
didn't loose anything because my index file size was
actually 0 bytes.


Thanks,
  George

 --- Karthik N S <[EMAIL PROTECTED]> wrote: 
> Hi
> 
>   George
> 
>    Do u think ,the same would work for MERGED
> Indexes....
>    Please Can u suggest a solution.
> 
> 
>   Karthik
> 
> -----Original Message-----
> From: Honey George [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 19, 2004 2:08 PM
> To: Lucene Users List
> Subject: RE: Restoring a corrupt index
> 
> 
> This is what I did.
> 
> There are 2 classes in the lucene source which are
> not
> public and therefore cannot be accessed from outside
> the package. The classes are
> 1. org.apache.lucene.index.SegmentInfos
>    - collection of segments
> 2. org.apache.lucene.index.SegmentInfo
>    -represents a sigle segment
> 
> I took these two files and moved to a separate
> folder.
> Then created a class with the following code
> fragment.
> 
>     public void displaySegments(String indexDir)
>         throws Exception
>     {
>         Directory dir =
> (Directory)FSDirectory.getDirectory(indexDir,
> false);
>         SegmentInfos segments = new SegmentInfos();
>         segments.read(dir);
> 
>         StringBuffer str = new StringBuffer();
>         int size = segments.size();
>         str.append("Index Dir = " + indexDir );
>         str.append("\nTotal Number of Segments " +
> size);
> 
>
str.append("\n--------------------------------------");
>         for(int i=0;i<size;i++)
>         {
>             str.append("\n");
>             str.append((i+1) + ". ");
> 
> str.append(((SegmentInfo)segments.get(i)).name);
>         }
> 
>
str.append("\n--------------------------------------");
> 
>         System.out.println(str.toString());
>     }
> 
> 
>     public void deleteSegment(String indexDir,
> String
> segmentName)
>         throws Exception
>     {
>         Directory dir =
> (Directory)FSDirectory.getDirectory(indexDir,
> false);
>         SegmentInfos segments = new SegmentInfos();
>         segments.read(dir);
> 
>         int size = segments.size();
>         String name = null;
>         boolean found = false;
>         for(int i=0;i<size;i++)
>         {
>             name =
> ((SegmentInfo)segments.get(i)).name;
>             if (segmentName.equals(name))
>             {
>                 found = true;
>                 segments.remove(i);
>                 System.out.println("Deleted the
> segment with name " + name
>                     + "from the segments file");
>                 break;
>             }
>         }
>         if (found)
>         {
>             segments.write(dir);
>         }
>         else
>         {
>             System.out.println("Invalid segment
> name:
> " + segmentName);
>         }
>     }
> 
> Use the displaySegments() method to display the
> segments and deleteSegment to delete the corrupt
> segment.
> 
> Thanks,
>   George
> 
>  --- Karthik N S <[EMAIL PROTECTED]> wrote:
> > Hi Guys....
> >
> >
> >    In Our Situation we would be indexing  Million
> &
> > Millions of Information
> > documents
> >
> >   with  Huge Giga Bytes of Data Indexed  and
> > finally would be  put into a
> > MERGED INDEX, Categorized accordingly.
> >
> >   There may be a possibility of Corruption,  So
> > Please do post  the code
> > reffrals....
> >
> >
> >  Thx
> > Karthik
> >
> >
> > -----Original Message-----
> > From: Honey George [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 18, 2004 5:51 PM
> > To: Lucene Users List
> > Subject: Re: Restoring a corrupt index
> >
> >
> > Thanks Erik, that worked. I was able to remove the
> > corrupt index and now it looks like the index is
> OK.
> > I
> > was able to view the number of documents in the
> > index.
> > Before that I was getting the error,
> > java.io.IOException: read past EOF
> >
> > I am yet to find out how my index got corrupted.
> > There
> > is another thread going on about this topic,
> >
>
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03165.html
> >
> > If anybody is facing similar problem and is
> > interested
> > in the code I can post it here.
> >
> > Thanks,
> >   George
> >
> >
> >
> >  --- Erik Hatcher <[EMAIL PROTECTED]>
> > wrote:
> > > The details of the segments file (and all the
> > > others) is freely
> > > available here:
> > >
> > >
> > >
> >
>
http://jakarta.apache.org/lucene/docs/fileformats.html
> > >
> > > Also, there is Java code in Lucene, of course,
> > that
> > > manipulates the
> > > segments file which could be leveraged (although
> > > probably package
> > > scoped and not easily usable in a standalone
> > repair
> > > tool).
> > >
> > >   Erik
> > >
> > >
> > > On Aug 18, 2004, at 6:50 AM, Honey George wrote:
> > >
> > > > Looks like problem is not with the hexeditor,
> > even
> > > in
> > > > the ultraedit(i had access to a windows box) I
> > am
> > > > seeing the same display. The problem is I am
> not
> > > able
> > > > to identify where a record starts with just 1
> > > record
> > > > in the file.
> > > >
> 
=== message truncated === 


        
        
                
___________________________________________________________ALL-NEW Yahoo! Messenger - 
all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Restoring a corrupt index

Reply via email to