We plan to support many indexes, so potentially you might have lots of active master as well, so if you have one place to go to find where to get index from, that would make operation much easier.
When I commit update, sometimes I got FileNotFound exception, so I rollback to previous snapshot. This have to do automatically. As I mentioned, I embed solr in our application server, The http port is used for many other things, so a dedicated port is used. And if you look at my protocol, it is really a lower stack protocol, so put it on top of http is not very efficient. I don't know much of the format of compound file, but I think to replicate it, it is not good idea to copy the whole file, because that could be easily more than 100M at least for our case. In our case, we have very small frequent update of index, Out of 100M, there must be lots of identical blocks. If it is needed, I could contribute rsyn based implementation. At last,the hardlink is really to avoid to make copy of files, since each replication we only bring in new or changed files, not the whole snapshot. --Yajun On Thu, Jun 26, 2008 at 11:21 PM, Noble Paul (JIRA) <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608668#action_12608668 > ] > > Noble Paul commented on SOLR-561: > --------------------------------- > > bq: First we have an active master, some standby masters and search slaves > > This looks like a good approach. In the current design I must allow users to > specify multiple 'materUrl' . This must take care of one or more standby > masters. It can automatically fallback to another master if one fails. > > bq.On active master, there is a index snapshots manager. Whenever there's an > update, it takes a snapshot. On window, it uses copy (I should try fsutil) > and on linux it uses hard link..The snapshot manager also clean up old > snapshots. From time to time, I still got index corruption when commit > update. When that happen, shapshot manager allows us to rollback to previous > good snapshot. > > How can I know if the index got corrupted? if I can know it the best way to > implement that would be to add a command to ReplicationHandler to rollback to > latest . > > bq.On active master, there is a replication server component which listens at > a specific port > plain socket communication is more work than relying over the simple http > protocol .The little extra efficiency you may achieve may not justify that > (http is not too solw either). In this case the servlet container provides > you with sockets , threads etc etc. Take a look at the patch on how > efficiently is it done in the current patch. > > > bq.client creates a tmp directory and hard link everything from its local > index directory, then for each file in the file list, if it does not exit > locally, get new file from server; if it is newer than local one, ask server > for update like rsync; if local files do not exist in file list, delete them. > in the case of compound file is used for index, the file update will update > only diff blocks. > The current implementation is more or less like what you have done. For a > compound file I am not sure if a diff based sync can be more efficient. > Because it is hard to get the similar blocks in the file. I rely on checksums > of whole file. If there is an efficient mechanism to obtain identical > blocks, share the code I can incorporate that > The hardlink approach may be not necessary now as I made the SolrCore not to > hardcode the index folder. > > > > > > > > >> Solr replication by Solr (for windows also) >> ------------------------------------------- >> >> Key: SOLR-561 >> URL: https://issues.apache.org/jira/browse/SOLR-561 >> Project: Solr >> Issue Type: New Feature >> Components: replication >> Affects Versions: 1.3 >> Environment: All >> Reporter: Noble Paul >> Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch >> >> >> The current replication strategy in solr involves shell scripts . The >> following are the drawbacks with the approach >> * It does not work with windows >> * Replication works as a separate piece not integrated with solr. >> * Cannot control replication from solr admin/JMX >> * Each operation requires manual telnet to the host >> Doing the replication in java has the following advantages >> * Platform independence >> * Manual steps can be completely eliminated. Everything can be driven from >> solrconfig.xml . >> ** Adding the url of the master in the slaves should be good enough to >> enable replication. Other things like frequency of >> snapshoot/snappull can also be configured . All other information can be >> automatically obtained. >> * Start/stop can be triggered from solr/admin or JMX >> * Can get the status/progress while replication is going on. It can also >> abort an ongoing replication >> * No need to have a login into the machine >> This issue can track the implementation of solr replication in java > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
