We designed our solution in such a way that it makes use of all Solr features so that we need to write very few things. > As I mentioned, I embed solr in our application server, The http port > is used for many other things, so a dedicated port is used. I fail to understand this. HTTP does not have to be on port 80 .any port is fine > And if you look at my protocol, it is really a lower stack protocol, so put > it on > top of http is not very efficient. I have deliberately chosen HTTP as the protocol because it works well w/ Solr design.
> At last,the hardlink is really to avoid to make copy of files, since > each replication we only bring in new or changed files, not the whole > snapshot. The current solution does not copy all files only changed ones are copied. On Fri, Jun 27, 2008 at 9:11 PM, Yajun Liu <[EMAIL PROTECTED]> wrote: > We plan to support many indexes, so potentially you might have lots of > active master as well, so if you have one place to go to find where to > get index from, that would make operation much easier. > > When I commit update, sometimes I got FileNotFound exception, so I > rollback to previous snapshot. This have to do automatically. > > As I mentioned, I embed solr in our application server, The http port > is used for many other things, so a dedicated port is used. And if you > look at my protocol, it is really a lower stack protocol, so put it on > top of http is not very efficient. > > I don't know much of the format of compound file, but I think to > replicate it, it is not good idea to copy the whole file, because that > could be easily more than 100M at least for our case. In our case, we > have very small frequent update of index, Out of 100M, there must be > lots of identical blocks. If it is needed, I could contribute rsyn > based implementation. > > At last,the hardlink is really to avoid to make copy of files, since > each replication we only bring in new or changed files, not the whole > snapshot. > > --Yajun > > > On Thu, Jun 26, 2008 at 11:21 PM, Noble Paul (JIRA) <[EMAIL PROTECTED]> wrote: >> >> [ >> https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608668#action_12608668 >> ] >> >> Noble Paul commented on SOLR-561: >> --------------------------------- >> >> bq: First we have an active master, some standby masters and search slaves >> >> This looks like a good approach. In the current design I must allow users to >> specify multiple 'materUrl' . This must take care of one or more standby >> masters. It can automatically fallback to another master if one fails. >> >> bq.On active master, there is a index snapshots manager. Whenever there's an >> update, it takes a snapshot. On window, it uses copy (I should try fsutil) >> and on linux it uses hard link..The snapshot manager also clean up old >> snapshots. From time to time, I still got index corruption when commit >> update. When that happen, shapshot manager allows us to rollback to previous >> good snapshot. >> >> How can I know if the index got corrupted? if I can know it the best way to >> implement that would be to add a command to ReplicationHandler to rollback >> to latest . >> >> bq.On active master, there is a replication server component which listens >> at a specific port >> plain socket communication is more work than relying over the simple http >> protocol .The little extra efficiency you may achieve may not justify that >> (http is not too solw either). In this case the servlet container provides >> you with sockets , threads etc etc. Take a look at the patch on how >> efficiently is it done in the current patch. >> >> >> bq.client creates a tmp directory and hard link everything from its local >> index directory, then for each file in the file list, if it does not exit >> locally, get new file from server; if it is newer than local one, ask server >> for update like rsync; if local files do not exist in file list, delete >> them. in the case of compound file is used for index, the file update will >> update only diff blocks. >> The current implementation is more or less like what you have done. For a >> compound file I am not sure if a diff based sync can be more efficient. >> Because it is hard to get the similar blocks in the file. I rely on >> checksums of whole file. If there is an efficient mechanism to obtain >> identical blocks, share the code I can incorporate that >> The hardlink approach may be not necessary now as I made the SolrCore not to >> hardcode the index folder. >> >> >> >> >> >> >> >> >>> Solr replication by Solr (for windows also) >>> ------------------------------------------- >>> >>> Key: SOLR-561 >>> URL: https://issues.apache.org/jira/browse/SOLR-561 >>> Project: Solr >>> Issue Type: New Feature >>> Components: replication >>> Affects Versions: 1.3 >>> Environment: All >>> Reporter: Noble Paul >>> Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch >>> >>> >>> The current replication strategy in solr involves shell scripts . The >>> following are the drawbacks with the approach >>> * It does not work with windows >>> * Replication works as a separate piece not integrated with solr. >>> * Cannot control replication from solr admin/JMX >>> * Each operation requires manual telnet to the host >>> Doing the replication in java has the following advantages >>> * Platform independence >>> * Manual steps can be completely eliminated. Everything can be driven from >>> solrconfig.xml . >>> ** Adding the url of the master in the slaves should be good enough to >>> enable replication. Other things like frequency of >>> snapshoot/snappull can also be configured . All other information can be >>> automatically obtained. >>> * Start/stop can be triggered from solr/admin or JMX >>> * Can get the status/progress while replication is going on. It can also >>> abort an ongoing replication >>> * No need to have a login into the machine >>> This issue can track the implementation of solr replication in java >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> > -- --Noble Paul
