Hi Cedric,
Although I have no idea how to use the Chinese language
but I went a different route to overcome language specific problems.
Instead of using a language specific segmentation we use now the statistical
segmentation with bigrams
e.g.
Given a your sentence XYZABCDEF
suppose the
Nikolay Diakov [EMAIL PROTECTED] wrote:
I see you do the wrapping in a RuntimeException trick. Perhaps you
can introduce a special exception derived from RuntimeException that
you would throw in that case. It would basically mean The
underlying FS does something we cannot tolerate so we fail
OK I've opened this issue:
https://issues.apache.org/jira/browse/LUCENE-1050
Mike
Michael McCandless [EMAIL PROTECTED] wrote:
Nikolay Diakov [EMAIL PROTECTED] wrote:
I see you do the wrapping in a RuntimeException trick. Perhaps you
can introduce a special exception derived from
Using solr, we have been running an indexing process for a while and
when I checked on it today, it spits out an error:
java.lang.RuntimeException: java.io.FileNotFoundException:
/path/to/index/_cf9.fnm (No such file or directory)
at
On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
Using solr, we have been running an indexing process for a while and
when I checked on it today, it spits out an error:
java.lang.RuntimeException: java.io.FileNotFoundException:
/path/to/index/_cf9.fnm (No such file or
Would this help: https://issues.apache.org/jira/browse/LUCENE-1020
On Nov 10, 2007, at 4:01 PM, Ryan McKinley wrote:
Using solr, we have been running an indexing process for a while and
when I checked on it today, it spits out an error:
java.lang.RuntimeException:
Would this help: https://issues.apache.org/jira/browse/LUCENE-1020
That should help here, but please proceed with caution: this tool is
very new, has only been tested on trunk indices, and is brutal in how
it recovers the index (it removes the entire segment if there is any
problem loading one
Yonik Seeley [EMAIL PROTECTED] wrote:
On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
Using solr, we have been running an indexing process for a while and
when I checked on it today, it spits out an error:
java.lang.RuntimeException: java.io.FileNotFoundException:
On Nov 10, 2007 4:34 PM, Michael McCandless [EMAIL PROTECTED] wrote:
Yonik Seeley [EMAIL PROTECTED] wrote:
On Nov 10, 2007 4:01 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
Using solr, we have been running an indexing process for a while and
when I checked on it today, it spits out an
thanks for all the replies
Yonik do you understand why so many unreferenced files are being produced
here? What's the root cause?
This is an index that has the same documents get updated many times,
that could build up old files w/o optimizing.
Just guesses... but perhaps new index
Yonik Seeley [EMAIL PROTECTED] wrote:
How can this lead to index corruption? The no such file or directory on
loading _cf9.fnm sounds like index corruption?
I don't think older versions of lucene handled these errors as well.
Perhaps _cf9.fnm failed to be written, but the segments file
Or maybe the index is not corrupt but then we are hitting the descriptor limit
on opening a searcher and it's being reported as no such file or directory?
Hmmm, yes that's possible too...
Should be easy to tell by checking if the file _cf9.fnm already exists.
Oh yeah. Ryan, does that file
On Nov 10, 2007 5:01 PM, Michael McCandless [EMAIL PROTECTED] wrote:
Yonik Seeley [EMAIL PROTECTED] wrote:
How can this lead to index corruption? The no such file or directory on
loading _cf9.fnm sounds like index corruption?
I don't think older versions of lucene handled these errors
Yonik Seeley [EMAIL PROTECTED] wrote:
On Nov 10, 2007 5:01 PM, Michael McCandless [EMAIL PROTECTED]
wrote:
Yonik Seeley [EMAIL PROTECTED] wrote:
How can this lead to index corruption? The no such file or directory
on
loading _cf9.fnm sounds like index corruption?
I don't
On Nov 9, 2007 11:40 AM, Mike Streeton [EMAIL PROTECTED] wrote:
I have just tried this again using the index I built with lucene 2.1 but
running the test using lucene 2.2 and it works okay, so it seems to be
something related to an index built using lucene 2.2.
I bet you are triggering an
Hi Uwe,
I believe this is the segmentation method used by CJKAnalyzer in Lucene.
The problem is with this Analyzer, many incorrect hits will be
returned during search.
In fact, for pure chinese document (not containing English words), I
believe there's no difference between using a CJKAnalyzer
16 matches
Mail list logo