[
https://issues.apache.org/jira/browse/BLUR-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637620#comment-13637620
]
Aaron McCurry commented on BLUR-74:
-----------------------------------
Let me discuss how we got here. In earlier versions of Blur, the index locks
(Lucene API LockFactory) were actually controlled by ZooKeeper. This made a
lot of sense when I wrote it. Basically there was an ephemeral node for per
shard per table. When a failure was detected and shards were relocated, it was
assumed that the ephemeral nodes would have been released (been removed by ZK)
by the node that went offline. And thus the locks would have been released,
and the server that was opening the shard would be able to obtain the lock
immediately and start the opening process by the writer. In that
implementation the waiting for the table to enable or disable was a matter of
waiting for the ephemeral nodes (the locks) to be present or not.
However in practice it did not work that well, the problem was that in running
a large cluster where there are thousands of shards ZK would not react that
fast to individual ephemeral nodes. And the result was during a failure the
server trying to open the down shard would wait for seconds to minutes to
obtain the lock to start opening the index. So the ZK lockfactory was replaced
with a HDFS versus that allows for any writer to obtain the lock however it
validates that the writer that the writer has the lock before committing any
new data to the index.
So the problem is that currently we really don't have idea what shards are
actually open on any given server. We only know what shards the "should" be
open, and that may be the answer. Perhaps we should add a another call in Blur
service in thrift and extend the "shardServerLayout" method behavior. We
should leave the existing call and it's behavior in place and add a another
"shardServerLayout" method that takes a parameter maybe an enum of ACTUAL and
CALCULATED. Where the CALCULATED is the current result and ACTUAL what is
really open. Then we can have the enable and disable calls key off the results
of that call and block appropriately.
Aaron
> Make the disabling and enabling of tables blocking calls.
> ---------------------------------------------------------
>
> Key: BLUR-74
> URL: https://issues.apache.org/jira/browse/BLUR-74
> Project: Apache Blur
> Issue Type: Bug
> Affects Versions: 0.1.5
> Reporter: Aaron McCurry
> Fix For: 0.1.5
>
>
> Currently the calls return, and then the action is carried out
> asynchronously. This is an issue with the writers when someone calls disable
> and remove very quickly and the indexes are to be removed. Because the
> indexes are deleted out form underneath the writers. This causes the shard
> servers to throw errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira