[ https://issues.apache.org/jira/browse/ROCKETMQ-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156531#comment-16156531 ]
ASF GitHub Bot commented on ROCKETMQ-193: ----------------------------------------- Github user Zhang-Ke closed the pull request at: https://github.com/apache/incubator-rocketmq-externals/pull/23 > Develop rocketmq-redis-replicator component > ------------------------------------------- > > Key: ROCKETMQ-193 > URL: https://issues.apache.org/jira/browse/ROCKETMQ-193 > Project: Apache RocketMQ > Issue Type: Task > Reporter: Rich Zhang > Assignee: Rich Zhang > Priority: Minor > Fix For: 4.2.0-incubating > > > Design: > Redis supplies an official replication mechanism , and slave communicates > to master with RESP protocol, so a natural way to design the > rocketmq-redis-replicator component is simulating itself as a slave, sending > commands to master and receiving datas from master timely, and then resending > to rocketmq broker. > If you are not familiar with redis replication mechanism, please learn this > section first [1]. After that, I will illustrate some key points ahead. > 1. To make slave start from the point where it left off when it reconnects, > slave and master should agree on a master runId and a replication offset. > Slave acknowledges this offset to master periodically. In other words,slave > may received duplicate commands. Along with, the rocketmq-redis-replicator > component may send duplicate messages too. A good way to minimize the > duplicate time window is reducing the "ack period" to a smaller one, such as > 100ms. > 2. If slave keeps offline for some time, it’s easy to use up backlog whose > default value is just 1M, especially for a high-traffic redis instance. > Unfortunatelly,if slave replication offset has already been covered in master > backlog, a full synchronization will have to execute, which is unacceptable > for rocketmq-redis-replicator component as a large number of messages will be > sent out intensively. > 3. When synchronizing from master fully, master will generate a new rdb > file(the rdb file format [2]),and slave will receive this file,store in disk, > and last apply to memory. This strategy makes slave reaches a consistent > state with master as soon as possible, and hardly fail. For > rocketmq-redis-replicator component, it’s also a good way to prevent > synchronizing initial rdb file from failure in halfway. > There already an open source project [3] which focuses on replicating redis > data, and provides api to handle data received [4]. The principal thoughts > are simulating itself as a slave , following official replication procedure, > communicating with master by RESP, and acking master with replication offset. > Base on this project to develop is a good idea, meanwhile some aspects should > also be enhanced and considered more robust. Here is some points: > [High Available] > Keeping the replication component's high availability is not difficult but > important, not only for providing an uninterruptible service. If component > leaves off for some time, a unacceptable full synchronization may be > triggered. > It’s also easy to reach high availability, including adopting master/slave > module, using zookeeper to coordinate and switch master/slave, storing data > onto zookeeper to keep component stateless. > [Data Loss] > Generally, data loss should be tried best to avoid. The key point is that > slave only acks replication offset to master after sending command to > rocketmq broker successfully. > [Data Stale] > It also happened when slave reconnect. Consider case below: > `time1` `time2` `time3` > set k=a set k=b set k=c > If slave left off at time3, but the latest replication offset reported to > master is only at time1, when slave reconnected, it re-apply commands “set > k=b… set k=c”. In a small time window, “k” will equal the stale “b” until > “set k=c” command is applied. So the slave offline time shorter, the better. > [Message Order] > Redis uses single thread model to keep command execute in order, because > of its high performance. Replicating data with a single thread in slave is > also fine, as it is also totally memory operation. But sending all data to > rocketmq in a global order is a good choose? Producer should have no > performance issue, but consumer may not be able to consume messages in time, > especially redis was in a high load. > Hashing “KEY” to different rocketmq queue is a good strategy. Guarantee > the same key operation route to a unique queue, to keep partial ordered, and > the downstream consumer could consume messages concurrently. Of course, some > dependency “KEY”s may need hash to a unique queue too. We should supply > configuration or api to support this individuation. > [Transaction] > Redis supports simple transaction. A transaction starts with a “MULTI” > command, and redis buffers latter commands and execute them until receiving a > “EXEC” command. But if one of the buffered commands executes fail, the > preceding executed commands won’t roll back, and the latter commands will be > discarded. So redis transaction could not guarantee atomic. > In rocketmq, it’s also impossible to gather multi messages consume > operation into a transaction. But the rocketmq-redis-replicator component > will just receive multi commands after redis server get a “EXEC” command. > From this aspect, the “transaction semantic doesn’t strengthen or weaken > after this component resend messages. > [Avoid component switched to master] > In Sentinel or Redis Cluster, master crash could be detected by some > mechanism, and one slave will be switched to master. Master has all > information about its slaves, and the candidate slave is picked up > automatically. Obviously, rocketmq-redis-replicator component have no ability > to undertake master role. Configure this component to “read only” slave is a > good way to avoid the component switch to master. > [Support Redis Cluster] > No matter Redis Cluster, or previous Partition, a slave only keeps track > of one master. So the replication mechanism won’t change in single-node redis > instance or a redis cluster. > [1] : https://redis.io/topics/replication > [2]: > https://github.com/sripathikrishnan/redis-rdb-tools/wiki/Redis-RDB-Dump-File-Format > [3]: https://github.com/leonchen83/redis-replicator > [4]: https://github.com/leonchen83/redis-replicator#31-replication-via-socket > -- This message was sent by Atlassian JIRA (v6.4.14#64029)