Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Uwe Goetzke
Hi Cedric, Although I have no idea how to use the Chinese language but I went a different route to overcome language specific problems. Instead of using a language specific segmentation we use now the statistical segmentation with bigrams e.g. Given a your sentence XYZABCDEF suppose the

Re: - lock improvement suggestion

2007-11-10 Thread Michael McCandless
Nikolay Diakov [EMAIL PROTECTED] wrote: I see you do the wrapping in a RuntimeException trick. Perhaps you can introduce a special exception derived from RuntimeException that you would throw in that case. It would basically mean The underlying FS does something we cannot tolerate so we fail

Re: - lock improvement suggestion

2007-11-10 Thread Michael McCandless
OK I've opened this issue: https://issues.apache.org/jira/browse/LUCENE-1050 Mike Michael McCandless [EMAIL PROTECTED] wrote: Nikolay Diakov [EMAIL PROTECTED] wrote: I see you do the wrapping in a RuntimeException trick. Perhaps you can introduce a special exception derived from

restoring a corrupt index?

2007-11-10 Thread Ryan McKinley
Using solr, we have been running an indexing process for a while and when I checked on it today, it spits out an error: java.lang.RuntimeException: java.io.FileNotFoundException: /path/to/index/_cf9.fnm (No such file or directory) at

Re: restoring a corrupt index?

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Using solr, we have been running an indexing process for a while and when I checked on it today, it spits out an error: java.lang.RuntimeException: java.io.FileNotFoundException: /path/to/index/_cf9.fnm (No such file or

Re: restoring a corrupt index?

2007-11-10 Thread Grant Ingersoll
Would this help: https://issues.apache.org/jira/browse/LUCENE-1020 On Nov 10, 2007, at 4:01 PM, Ryan McKinley wrote: Using solr, we have been running an indexing process for a while and when I checked on it today, it spits out an error: java.lang.RuntimeException:

Re: restoring a corrupt index?

2007-11-10 Thread Michael McCandless
Would this help: https://issues.apache.org/jira/browse/LUCENE-1020 That should help here, but please proceed with caution: this tool is very new, has only been tested on trunk indices, and is brutal in how it recovers the index (it removes the entire segment if there is any problem loading one

Re: restoring a corrupt index?

2007-11-10 Thread Michael McCandless
Yonik Seeley [EMAIL PROTECTED] wrote: On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Using solr, we have been running an indexing process for a while and when I checked on it today, it spits out an error: java.lang.RuntimeException: java.io.FileNotFoundException:

Re: restoring a corrupt index?

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 4:34 PM, Michael McCandless [EMAIL PROTECTED] wrote: Yonik Seeley [EMAIL PROTECTED] wrote: On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Using solr, we have been running an indexing process for a while and when I checked on it today, it spits out an

Re: restoring a corrupt index?

2007-11-10 Thread Ryan McKinley
thanks for all the replies Yonik do you understand why so many unreferenced files are being produced here? What's the root cause? This is an index that has the same documents get updated many times, that could build up old files w/o optimizing. Just guesses... but perhaps new index

Re: restoring a corrupt index?

2007-11-10 Thread Michael McCandless
Yonik Seeley [EMAIL PROTECTED] wrote: How can this lead to index corruption? The no such file or directory on loading _cf9.fnm sounds like index corruption? I don't think older versions of lucene handled these errors as well. Perhaps _cf9.fnm failed to be written, but the segments file

Re: restoring a corrupt index?

2007-11-10 Thread Ryan McKinley
Or maybe the index is not corrupt but then we are hitting the descriptor limit on opening a searcher and it's being reported as no such file or directory? Hmmm, yes that's possible too... Should be easy to tell by checking if the file _cf9.fnm already exists. Oh yeah. Ryan, does that file

Re: restoring a corrupt index?

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 5:01 PM, Michael McCandless [EMAIL PROTECTED] wrote: Yonik Seeley [EMAIL PROTECTED] wrote: How can this lead to index corruption? The no such file or directory on loading _cf9.fnm sounds like index corruption? I don't think older versions of lucene handled these errors

Re: restoring a corrupt index?

2007-11-10 Thread Michael McCandless
Yonik Seeley [EMAIL PROTECTED] wrote: On Nov 10, 2007 5:01 PM, Michael McCandless [EMAIL PROTECTED] wrote: Yonik Seeley [EMAIL PROTECTED] wrote: How can this lead to index corruption? The no such file or directory on loading _cf9.fnm sounds like index corruption? I don't

Re: TermDocs.skipTo error

2007-11-10 Thread Yonik Seeley
On Nov 9, 2007 11:40 AM, Mike Streeton [EMAIL PROTECTED] wrote: I have just tried this again using the index I built with lucene 2.1 but running the test using lucene 2.2 and it works okay, so it seems to be something related to an index built using lucene 2.2. I bet you are triggering an

Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Cedric Ho
Hi Uwe, I believe this is the segmentation method used by CJKAnalyzer in Lucene. The problem is with this Analyzer, many incorrect hits will be returned during search. In fact, for pure chinese document (not containing English words), I believe there's no difference between using a CJKAnalyzer