Thank you very much for the detailed information everyone! I will try to use the information to make my code better.
I have parsed out the optimization bits into a commandline app that runs the optimize on another box. Its messy, but effective in keeping downtime to a minimum. This will get the large amount of segment files under control for now. Too bad it takes a week or more. Hopefully I will not have to reindex it anytime soon. I think the best way around this is transaction/agent based for the future. That way, I can keep a read only copy for searching. My app currently uses two services, one for writes and one for reads. I suspect that this may be the problem that is causing the corruption. Does anyone have any experience with this type of setup, and has seen/knows that this can cause a corrupted lucene index? I have heard that having more than one service attached at a time causes the problem I am seeing. Thanks for the links to the old Luke distros, and thanks for all the quick responses! Hugh Andrzej Bialecki wrote: > > lowfreq wrote: >> I have a Lucene index that is very large in size. >> It was created using a pre 2.1 version of Lucene.net 2.0.0.4. >> >> The index is currently almost 20 GB, and has almost 7000 segment files. >> The problem I am having is that I need to optimize it, and cant do this >> without the search functionality of my app being down for a week. >> >> I used the Luke tool from getopt.org and it worked flawlessly, optimizing >> the index in just over 2 hours. Problem is that my search cannot use it, >> and >> the error states Unknown Format Version errors, or just plain nothing >> found. > > You should be careful when using Lucene Java to modify Lucene.Net > indexes. I know for a fact that deflated data in Lucene Java is > incompatible with the deflater implementation in .Net, so it's easy to > create an incompatible index even when you use a supposedly compatible > version of Lucene Java. Perhaps versions around 2.0 still worked ok, but > no guarantees. > > >> >> I understand that versions of Lucene that are newer than what the index >> was >> built and is searched with can cause problems. >> >> What can I do to make this work? I have tried older versions of Luke, 0.7 >> was the oldest I could lay hands on, but even it uses a newer version of >> Lucene. > > Here are links to older versions of Luke: > > http://www.getopt.org/luke/luke-0.1.zip > http://www.getopt.org/luke/luke-0.2.zip > http://www.getopt.org/luke/luke-0.3.zip > http://www.getopt.org/luke/luke-0.4.zip > http://www.getopt.org/luke/luke-0.5/luke-0.5.jar > http://www.getopt.org/luke/luke-0.5/luke-src-0.5.zip > http://www.getopt.org/luke/luke-0.6/lukeall-0.6.jar > http://www.getopt.org/luke/luke-0.6/luke-src-0.6.zip > > >> >> My index version shows as 633103800023469045. The version the index is >> written as after optimizing with Luke 7.0 is 633103800023469057. > > This is just a timestamp, so it doesn't say what version of Lucene > created the index. If you open the index with Luke, in the Overview tab > there is a line that tells what is the index format version. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Optimization-and-Corruption-Issues-tp25697034p25705907.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org