Thank you very much for the detailed information everyone!
I will try to use the information to make my code better.

I have parsed out the optimization bits into a commandline app that runs the
optimize on another box. Its messy, but effective in keeping downtime to a
minimum. This will get the large amount of segment files under control for
now. Too bad it takes a week or more. Hopefully I will not have to reindex
it anytime soon. 

I think the best way around this is transaction/agent based for the future.
That way, I can keep a read only copy for searching.

My app currently uses two services, one for writes and one for reads.
I suspect that this may be the problem that is causing the corruption.

Does anyone have any experience with this type of setup, and has seen/knows
that this can cause a corrupted lucene index? 

I have heard that having more than one service attached at a time causes the
problem I am seeing.

Thanks for the links to the old Luke distros, and thanks for all the quick
responses!

Hugh


Andrzej Bialecki wrote:
> 
> lowfreq wrote:
>> I have a Lucene index that is very large in size. 
>> It was created using a pre 2.1 version of Lucene.net 2.0.0.4. 
>> 
>> The index is currently almost 20 GB, and has almost 7000 segment files. 
>> The problem I am having is that I need to optimize it, and cant do this
>> without the search functionality of my app being down for a week. 
>> 
>> I used the Luke tool from getopt.org and it worked flawlessly, optimizing
>> the index in just over 2 hours. Problem is that my search cannot use it,
>> and
>> the error states Unknown Format Version errors, or just plain nothing
>> found. 
> 
> You should be careful when using Lucene Java to modify Lucene.Net 
> indexes. I know for a fact that deflated data in Lucene Java is 
> incompatible with the deflater implementation in .Net, so it's easy to 
> create an incompatible index even when you use a supposedly compatible 
> version of Lucene Java. Perhaps versions around 2.0 still worked ok, but 
> no guarantees.
> 
> 
>> 
>> I understand that versions of Lucene that are newer than what the index
>> was
>> built and is searched with can cause problems. 
>> 
>> What can I do to make this work? I have tried older versions of Luke, 0.7
>> was the oldest I could lay hands on, but even it uses a newer version of
>> Lucene. 
> 
> Here are links to older versions of Luke:
> 
>       http://www.getopt.org/luke/luke-0.1.zip
>       http://www.getopt.org/luke/luke-0.2.zip
>       http://www.getopt.org/luke/luke-0.3.zip
>       http://www.getopt.org/luke/luke-0.4.zip
>       http://www.getopt.org/luke/luke-0.5/luke-0.5.jar
>       http://www.getopt.org/luke/luke-0.5/luke-src-0.5.zip
>       http://www.getopt.org/luke/luke-0.6/lukeall-0.6.jar
>       http://www.getopt.org/luke/luke-0.6/luke-src-0.6.zip
> 
> 
>> 
>> My index version shows as 633103800023469045. The version the index is
>> written as after optimizing with Luke 7.0 is 633103800023469057. 
> 
> This is just a timestamp, so it doesn't say what version of Lucene 
> created the index. If you open the index with Luke, in the Overview tab 
> there is a line that tells what is the index format version.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Optimization-and-Corruption-Issues-tp25697034p25705907.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to