We designed our solution in such a way that it makes use of all Solr
features so that we need to write very few things.
> As I mentioned, I embed solr in our application server, The http port
> is used for many other things, so a dedicated port is used.
I fail to understand this. HTTP does not have to be on port 80 .any port is fine
> And if you look at my protocol, it is really a lower stack protocol, so put 
> it on
> top of http is not very efficient.
I have deliberately chosen HTTP as the protocol because it works well
w/ Solr design.

> At last,the hardlink is really to avoid to make copy of files, since
> each replication we only bring in new or changed files, not the whole
> snapshot.
The current solution does not copy all files only changed ones are copied.

On Fri, Jun 27, 2008 at 9:11 PM, Yajun Liu <[EMAIL PROTECTED]> wrote:
> We plan to support many indexes, so potentially you might have lots of
> active master as well, so if you have one place to go to find where to
> get index from, that would make operation much easier.
>
> When I commit update, sometimes I got FileNotFound exception, so I
> rollback to previous snapshot. This have to do automatically.
>
> As I mentioned, I embed solr in our application server, The http port
> is used for many other things, so a dedicated port is used. And if you
> look at my protocol, it is really a lower stack protocol, so put it on
> top of http is not very efficient.
>
> I don't know much of the format of compound file, but I think to
> replicate it, it is not good idea to copy the whole file, because that
> could be easily more than 100M at least for our case. In our case, we
> have very small frequent update of index, Out of 100M, there must be
> lots of identical blocks. If it is needed, I could contribute rsyn
> based implementation.
>
> At last,the hardlink is really to avoid to make copy of files, since
> each replication we only bring in new or changed files, not the whole
> snapshot.
>
> --Yajun
>
>
> On Thu, Jun 26, 2008 at 11:21 PM, Noble Paul (JIRA) <[EMAIL PROTECTED]> wrote:
>>
>>    [ 
>> https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608668#action_12608668
>>  ]
>>
>> Noble Paul commented on SOLR-561:
>> ---------------------------------
>>
>> bq: First we have an active master, some standby masters and search slaves
>>
>> This looks like a good approach. In the current design I must allow users to 
>> specify multiple 'materUrl' . This must take care of one or more standby 
>> masters.  It can automatically fallback to another master if one fails.
>>
>> bq.On active master, there is a index snapshots manager. Whenever there's an 
>> update, it takes a snapshot. On window, it uses copy (I should try fsutil) 
>> and on linux it uses hard link..The snapshot manager also clean up old 
>> snapshots. From time to time, I still got index corruption when commit 
>> update. When that happen, shapshot manager allows us to rollback to previous 
>> good snapshot.
>>
>> How can I know if the index got corrupted? if I can know it the best way to 
>> implement that would be to add a command to ReplicationHandler to rollback 
>> to latest .
>>
>> bq.On active master, there is a replication server component which listens 
>> at a specific port
>> plain socket communication is more work than relying over the simple http 
>> protocol .The little extra efficiency you may achieve may not justify that 
>> (http is not too solw either). In this case the servlet container provides 
>> you with sockets , threads etc etc. Take a look at the patch on how 
>> efficiently is it done in the current patch.
>>
>>
>> bq.client creates a tmp directory and hard link everything from its local 
>> index directory, then for each file in the file list, if it does not exit 
>> locally, get new file from server; if it is newer than local one, ask server 
>> for update like rsync; if local files do not exist in file list, delete 
>> them. in the case of compound file is used for index, the file update will 
>> update only diff blocks.
>> The current implementation is more or less like what you have done. For a 
>> compound file I am not sure if a diff based sync can be more efficient. 
>> Because it is hard to get the similar blocks in the file. I rely on 
>> checksums  of whole file. If there is an efficient mechanism to obtain 
>> identical blocks, share the code I can incorporate that
>> The hardlink approach may be not necessary now as I made the SolrCore not to 
>> hardcode the index folder.
>>
>>
>>
>>
>>
>>
>>
>>
>>> Solr replication by Solr (for windows also)
>>> -------------------------------------------
>>>
>>>                 Key: SOLR-561
>>>                 URL: https://issues.apache.org/jira/browse/SOLR-561
>>>             Project: Solr
>>>          Issue Type: New Feature
>>>          Components: replication
>>>    Affects Versions: 1.3
>>>         Environment: All
>>>            Reporter: Noble Paul
>>>         Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch
>>>
>>>
>>> The current replication strategy in solr involves shell scripts . The 
>>> following are the drawbacks with the approach
>>> *  It does not work with windows
>>> * Replication works as a separate piece not integrated with solr.
>>> * Cannot control replication from solr admin/JMX
>>> * Each operation requires manual telnet to the host
>>> Doing the replication in java has the following advantages
>>> * Platform independence
>>> * Manual steps can be completely eliminated. Everything can be driven from 
>>> solrconfig.xml .
>>> ** Adding the url of the master in the slaves should be good enough to 
>>> enable replication. Other things like frequency of
>>> snapshoot/snappull can also be configured . All other information can be 
>>> automatically obtained.
>>> * Start/stop can be triggered from solr/admin or JMX
>>> * Can get the status/progress while replication is going on. It can also 
>>> abort an ongoing replication
>>> * No need to have a login into the machine
>>> This issue can track the implementation of solr replication in java
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>



-- 
--Noble Paul

Reply via email to