Actually, those exceptions are thrown by the code detecting the
mismatch, and then caught by CheckIndex and handled as meaning that
segment is corrupt. This is consistent eg with how Lucene throws
CorruptIndexException deep down if it hits an inconsistency.
I think it's fine if you want to not use exceptions for the "local"
mismatches, and instead record the error in a data structure and then
stop processing that one segment. But for the "deep down" exceptions
you still have to keep the catch in CheckIndex to record those.
Mike
On Aug 5, 2008, at 9:30 AM, Grant Ingersoll wrote:
I'll look into these. The other parts I am not sure on is the
throwing of exceptions for mismatches. I know they mean CheckIndex
can't go forward, but they aren't really errors in CheckIndex, so
much as errors in the index, which CheckIndex is just reporting.
So, I'm inclined to capture that and present it (and return
immediately) instead of throw an exception. Is that reasonable?
-Grant
On Aug 4, 2008, at 5:01 PM, Michael McCandless wrote:
This sounds good! I like the idea of checking the index when Solr
has to force release the write.lock.
The one caveat is, when checking a large index (which can take
quite some time), it'd be nice to have the equivalent of the
inline'd out.print/ln calls happen in realtime so that you can see
(on the command line output) that progress is being made, which
segment is being checked, etc.?
Maybe change it to an optional "infoStream" (like IndexWriter), and
then the current inlined prints become calls to message() which
checks if infoStream is non-null?
Mike
Grant Ingersoll wrote:
Hey Mike,
I'm thinking about https://issues.apache.org/jira/browse/SOLR-566
and was thinking about adding some more programmatic access to the
CheckIndex tool and wanted to see if you had any thoughts.
Basically, I am going to to capture info into a simple data
structure that can then be introspected and serialized into a
RequestHandler, but also something that might be more generally
useful in certain cases where things go bad. I was debating
keeping the inline out.printlns, but not sure if they shouldn't
just be moved to the main such that the cmd line stuff still works
as is, but it doesn't clog the logs for those that want
programmatic access.
I'll post a patch soon, but wanted to see if you had any
preliminary insight.
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]