Just now tried to understand the logic...

Whenever an IOException/TTransportException is thrown, we mark a Connection
as bad. Slowly when all Connections are greeted by this, we get "All
Connections Bad..."

Is it a good idea to write a reaper thread to proactively try & replenish
the bad Connection, instead of waiting for search to hit it at the wrong
moment?

Also, I just found that "staleness" check is eagerly performed. It should
be possible to return a live connection & refresh stale ones in background?
[*ClientPool.getConnection(Connection conn)*]

--
Ravi



On Sat, Dec 10, 2016 at 3:44 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> Often, I find myself bang in the middle of a query, when BlurClientManager
> comes up with this error. Happens both ways. When my app-server talks to
> controller-server as well as controller-server talks to shard-server. This
> is affecting search experience quite a bit nowadays in production!!
>
> BlurException(message:Unknown error during remote call to node
> [AAA.BB.CCC.DD:40020], 
> stackTraceStr:org.apache.blur.thrift.BadConnectionException:
> Could not connect to controller/shard server. All connections are bad. at
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:243)
> at 
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:314)
> at 
> org.apache.blur.thrift.BlurControllerServer$BlurClientRemote$1.call(BlurControllerServer.java:132)
> at org.apache.blur.thrift.BlurControllerServer$BlurClientRemote.execute(
> BlurControllerServer.java:139)
>
> When do we get such an Exception? In-correct timeout settings or
> shard-server restarts etc...
>
> Any help is much appreciated
>
> --
> Ravi
>

Reply via email to