Re: get_key_range (CASSANDRA-169)

Jonathan Ellis Mon, 14 Sep 2009 16:06:47 -0700

Great, thanks for testing it.  I'll commit soon.

-Jonathan


On Mon, Sep 14, 2009 at 5:37 PM, Simon Smith <[email protected]> wrote:
> Jonathan:
>
> I tried out the patch you attached to JIRA-440, I applied it to 0.4,
> and it works for me.  Now, as soon as I take the node down, there may
> be one or two seconds of the thrift-internal error (timeout) but as
> soon as the host doing the querying can see the node is down, the
> error stops, and valid output is given by the get_key_range query
> again.  And there isn't any disruption when the node comes back up.
>
> Thanks!  (I put this same note in the bug report).
>
> Simon Smith
>
>
>
>
> On Fri, Sep 11, 2009 at 9:38 AM, Simon Smith <[email protected]> wrote:
>> https://issues.apache.org/jira/browse/CASSANDRA-440
>>
>> Thanks again, of course I'm happy to give any additional information
>> and will gladly do any testing of the fix.
>>
>> Simon
>>
>>
>> On Thu, Sep 10, 2009 at 7:32 PM, Jonathan Ellis <[email protected]> wrote:
>>> That confirms what I suspected, thanks.
>>>
>>> Can you file a ticket on Jira and I'll work on a fix for you to test?
>>>
>>> thanks,
>>>
>>> -Jonathan
>>>
>>> On Thu, Sep 10, 2009 at 4:42 PM, Simon Smith<[email protected]> wrote:
>>>> I sent get_key_range to node #1 (174.143.182.178), and here are the
>>>> resulting log lines from 174.143.182.178's log (Do you want the other
>>>> nodes' log lines? Let me know if so.)
>>>>
>>>> DEBUG - get_key_range
>>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>>> startWith='', stopAt='', maxResults=100) from [email protected]:7000
>>>> DEBUG - collecting :false:3...@1252535119
>>>>  [ ... chop the repeated & identical collecting messages ... ]
>>>> DEBUG - collecting :false:3...@1252535119
>>>> DEBUG - Sending RangeReply(keys=[java, java1, java2, java3, java4,
>>>> java5, match, match1, match2, match3, match4, match5, newegg, newegg1,
>>>> newegg2, newegg3, newegg4, newegg5, now, now1, now2, now3, now4, now5,
>>>> sgs, sgs1, sgs2, sgs3, sgs4, sgs5, test, test1, test2, test3, test4,
>>>> test5, xmind, xmind1, xmind2, xmind3, xmind4, xmind5],
>>>> completed=false) to [email protected]:7000
>>>> DEBUG - Processing response on an async result from 
>>>> [email protected]:7000
>>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>>> startWith='', stopAt='', maxResults=58) from [email protected]:7000
>>>> DEBUG - Processing response on an async result from 
>>>> [email protected]:7000
>>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>>> startWith='', stopAt='', maxResults=58) from [email protected]:7000
>>>> DEBUG - Processing response on an async result from 
>>>> [email protected]:7000
>>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>>> startWith='', stopAt='', maxResults=22) from [email protected]:7000
>>>> DEBUG - Processing response on an async result from 
>>>> [email protected]:7000
>>>> DEBUG - Disseminating load info ...
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Simon
>>>>
>>>> On Thu, Sep 10, 2009 at 5:25 PM, Jonathan Ellis <[email protected]> wrote:
>>>>> I think I see the problem.
>>>>>
>>>>> Can you check if your range query is spanning multiple nodes in the
>>>>> cluster?  You can tell by setting the log level to DEBUG, and looking
>>>>> for after it logs get_key_range, it will say "reading
>>>>> RangeCommand(...) from ... @machine" more than once.
>>>>>
>>>>> The bug is that when picking the node to start the range query it
>>>>> consults the failure detector to avoid dead nodes, but if the query
>>>>> spans nodes it does not do that on subsequent nodes.
>>>>>
>>>>> But if you are only generating one RangeCommand per get_key_range then
>>>>> we have two bugs. :)
>>>>>
>>>>> -Jonathan
>>>>>
>>>>
>>>
>>
>

Re: get_key_range (CASSANDRA-169)

Reply via email to