Glad to see everyone’s discussion,Er, the compact topic is also a way to implement the KV semantic storage, but If the compact index is introduced, In fact, different from messaging products, it cannot be used as a log cleaner,and will only be used to reduce the cost of message replay, and if we want to make the compact topic to be a way to clean up messages like other messaging products, the changes to the storage layer will also be very large.
dongeforever <[email protected]> 于2021年9月24日周五 下午8:41写道: > keep calm. > Every RocketMQ contributor has the responsibility to maintain and develop > the architecture. > Integrating RocksDB has its pros and cons, so does the compaction(not > compression) topic. > It's better to continue the discussion if each could show the details > without emotion. > It is also ok to start a meeting about this proposal. > > In my opinion, KV feature is important for RocketMQ, the question is in > what a way. > > vongosling <[email protected]> 于2021年9月24日周五 下午3:58写道: > > > I just wonder if we could not open a new kV module, we really could not > > solve the question you're mentioned? we could introduce a compression > topic > > for the last effective replay. Do we know how hard it is to maintain a > > RocksDB, which my Facebook friends have talked to me a lot about, > > their various kV production around RocksDB? > > > > Since we open-source RocketMQ, I've tried to help keep the architecture > as > > simple as possible, and I've warned you to learn to subtract while adding > > things. While the code scale is still under control, I think the first > > thing to really do is to design a good pluggable system that replaces the > > existing simple SPI and API approach, but unfortunately, I haven't seen > any > > progress in that regard so far. > > > > Therefore, I think we should carefully examine our technical solutions. > We > > should not let the technical solutions just only for streams :-) > > > > heng du <[email protected]> 于2021年9月24日周五 下午12:43写道: > > > > > Totally agree with this proposal. The KV semantic storage can not only > > > provide better support for streaming and connect, especially the > storage > > of > > > checkpoints but also can be used to better manage metadata in the > future. > > > At the same time, compared to the compact topic, this proposal can > > > significantly reduce user replay costs and save failure recovery time, > > and > > > KV semantic storage can actually be regarded as another index similar > to > > > CQ, which can be loaded on demand. In addition, there seems to be an > > > unvoted RIP-22 proposal, but please don’t care :) > > > > > > > > > vongosling <[email protected]> 于2021年9月23日周四 下午8:35写道: > > > > > > > Thanks for your clarify. I have been confused by RIP 22, It seems we > > have > > > > occupied 22, right[1]? > > > > > > > > > > > > [1] https://github.com/apache/rocketmq/issues/2937 > > > > > > > > Amber Liu <[email protected]> 于2021年9月23日周四 下午3:29写道: > > > > > > > > > Sorry about the format problem, below is the correct one. > > > > > RIP-22 Support KV semantic storageStatus > > > > > > > > > > - Current Status: Draft > > > > > - Authors: ltamber <https://github.com/ltamber> > > > > > > > > > > > > > > > - Shepherds: duhengforever <[email protected]> > > > > > - Mailing List discussion: [email protected] > > > > > > > > > > > > > > > - Pull Request: #PR_NUMBER > > > > > - Released: <released_version> > > > > > > > > > > Background & Motivationwhat do we need to do > > > > > > > > > > - will we add a new module? *no*. > > > > > - will we add new APIs? *yes*. > > > > > > > > > > > > > > > - will we add new feature? *yes*. > > > > > > > > > > Why should we do that > > > > > > > > > > - Are there any problems of our current project? > > > > > Currently, we can't get/put key-value from/into rocketmq, so > if > > we > > > > use > > > > > connector <https://github.com/apache/rocketmq-externals>, like > > > > > FileSource, BinlogSource, we can't persist current read > > > position/dump > > > > > position to rocketmq rather than an external meta store like > > > > > zookeeper/mysql, this will bring more operator risk by introduce > > > > another > > > > > component. this issue was also in streaming > > > > > <https://github.com/apache/rocketmq-streams> scenarios when > > > developer > > > > > want to persist meta info like checkpoint. > > > > > - What can we benefit proposed changes? > > > > > rocketmq would not rely on external componet such as > > > zookeeper/etcd > > > > > to support meta data storage. > > > > > > > > > > Goals > > > > > > > > > > - What problem is this proposal designed to solve? > > > > > Design a distribution persistent key-value store, > application > > > can > > > > > put key-value into broker, and then get the value after a while, > > in > > > > the > > > > > same time, it can also have the ability like compareAndSet, > prefix > > > get > > > > > and > > > > > so on. > > > > > - To what degree should we solve the problem? > > > > > This RIP must guarantee below point: > > > > > 1. High availablity: if one broker in the broker group is > down, > > > > > application can put/get key-value through other broker, the > > > > availablity > > > > > is > > > > > same with the message of rocketmq. > > > > > 2. High capacity: the amount of key-value may very large, so > > the > > > > > key-value can not store in memory, we must store the key-value > in > > > > disk > > > > > device. > > > > > > > > > > Non-Goals > > > > > > > > > > - What problem is this proposal NOT designed to solve? > > > > > Nothing specific. > > > > > - Are there any limits of this proposal? > > > > > Nothing specific. > > > > > > > > > > ChangesArchitecture > > > > > > > > > > > > > > > > > > > > We will introduce rocksdb <https://github.com/facebook/rocksdb> to > > > > persist > > > > > key-value data, to say it more accurately, we use rocksdb to > compact > > > the > > > > > value with the same key, we will not enable WAL in rocksdb to > > decrease > > > > > write amplification (most case), instead we can recover the rocksdb > > > state > > > > > and consistency by redo rocketmq commitlog. so the put/get flow > > showed > > > on > > > > > the above figure is: > > > > > put: the key-value message will put into commitlog first, and then > > > > through > > > > > the reputService redo commitlog, the key-value will put to rocksdb > > > > > asynchronous, until this reput finished broker will not response to > > > > client. > > > > > get: application will get key-value from rocksdb thought broker > > > directly. > > > > > In addition, if we don't want introduce rocksdb > > > > > <https://github.com/facebook/rocksdb> and the meta data content > will > > > not > > > > > occupy too many memory, we can also use a key-value store base on > > > memory > > > > > map, there will a periodic serialization and persistence thread to > > > > > guarantee data won't loss if broker restart or system abnormal > > > shutdown, > > > > > and the memory state consistency will also guaranteed by redo > > rocketmq > > > > > commitlog. > > > > > Interface Design/Change > > > > > > > > > > - Method signature changes. *No* > > > > > - Method behavior changes. *No* > > > > > > > > > > > > > > > - CLI command changes. *No* > > > > > - Log format or content changes. > > > > > the properties of the message will add two flag, kv_opType > > > indicate > > > > > the request type is put key-value or get key-value, and key > > indicate > > > > the > > > > > request key both in put or get operation. In order to pass the > key > > > > > through > > > > > the network in the request header, we will encode/decode the > > > key(byte > > > > > array > > > > > format) use base64 > > > > > < > https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html> > > > > > encoding method. > > > > > > > > > > > > > > > Compatibility, Deprecation, and Migration Plan > > > > > > > > > > - Are backward and forward compatibility taken into > consideration? > > > > > New RequestCode between client and broker are added, so there > > > are 2 > > > > > compatibility situations: > > > > > 1. old client+new broker: old clients won't make request > with > > > > > key-value flag, so broker will not receive key-value request, > > which > > > > keep > > > > > all things as before. > > > > > 2. new client+old broker: new clients will send key-value > > > request, > > > > > but the broker don't recognize the request code, and will return > > > error > > > > > msg. > > > > > so we should upgrade broker first to support this feature. > > > > > - Are there deprecated APIs? > > > > > Nothing specific. > > > > > > > > > > > > > > > - How do we do migration? > > > > > Nothing specific. > > > > > > > > > > Implementation Outline > > > > > > > > > > We will implement the proposed changes by two phases. > > > > > Phase 1 > > > > > > > > > > 1. Implement reput logic from commitlog to rocksdb. > > > > > 2. Implement broker support key-value request and response. > > > > > > > > > > > > > > > 1. Implement client support key-value request and response. > > > > > 2. Implement key-value store use memory map. > > > > > > > > > > > > > > > 1. Implement key-value store use rocksdb. > > > > > > > > > > Phase 2 > > > > > > > > > > 1. Implement prefix get semantics. > > > > > 2. Implement compareAndSet semantics. > > > > > > > > > > > > > > > 1. Implement rocksdb snapshot export/import. > > > > > > > > > > > > > > > Amber Liu <[email protected]> 于2021年9月23日周四 上午10:10写道: > > > > > > > > > > > # RIP-22 Support KV semantic storage > > > > > > > > > > > > ## Status > > > > > > - Current Status: Draft > > > > > > - Authors: [ltamber](https://github.com/ltamber) > > > > > > - Shepherds: [duhengforever](mailto:[email protected]) > > > > > > - Mailing List discussion: <[email protected]> > > > > > > - Pull Request: #PR_NUMBER > > > > > > - Released: <released_version> > > > > > > ## Background & Motivation > > > > > > ### what do we need to do > > > > > > - will we add a new module? **no**. > > > > > > - will we add new APIs? **yes**. > > > > > > - will we add new feature? **yes**. > > > > > > ### Why should we do that > > > > > > - Are there any problems of our current project? > > > > > > Currently, we can't get/put key-value from/into rocketmq, so if > > we > > > > use > > > > > > [connector](https://github.com/apache/rocketmq-externals), like > > > > > > FileSource, BinlogSource, we can't persist current read > > position/dump > > > > > > position to rocketmq rather than an external meta store like > > > > > > zookeeper/mysql, this will bring more operator risk by introduce > > > > another > > > > > > component. this issue was also in [streaming]( > > > > > > https://github.com/apache/rocketmq-streams) scenarios when > > developer > > > > > want > > > > > > to persist meta info like checkpoint. > > > > > > - What can we benefit proposed changes? > > > > > > rocketmq would not rely on external componet such as > > > zookeeper/etcd > > > > to > > > > > > support meta data storage. > > > > > > ### Goals > > > > > > - What problem is this proposal designed to solve? > > > > > > Design a distribution persistent key-value store, application > > can > > > > put > > > > > > key-value into broker, and then get the value after a while, in > the > > > > same > > > > > > time, it can also have the ability like compareAndSet, prefix get > > and > > > > so > > > > > on. > > > > > > - To what degree should we solve the problem? > > > > > > This RIP must guarantee below point: > > > > > > 1. High availablity: if one broker in the broker group is > down, > > > > > > application can put/get key-value through other broker, the > > > availablity > > > > > is > > > > > > same with the message of rocketmq. > > > > > > 2. High capacity: the amount of key-value may very large, so > the > > > > > > key-value can not store in memory, we must store the key-value > in > > > disk > > > > > > device. > > > > > > ### Non-Goals > > > > > > - What problem is this proposal NOT designed to solve? > > > > > > Nothing specific. > > > > > > - Are there any limits of this proposal? > > > > > > Nothing specific. > > > > > > ## Changes > > > > > > ### Architecture > > > > > >  > > > > > > We will introduce [rocksdb](https://github.com/facebook/rocksdb) > > to > > > > > > persist key-value data, to say it more accurately, we use rocksdb > > to > > > > > > compact the value with the same key, we will not enable WAL in > > > rocksdb > > > > to > > > > > > decrease write amplification (most case), instead we can recover > > the > > > > > > rocksdb state and consistency by redo rocketmq commitlog. so the > > > > put/get > > > > > > flow showed on the above figure is: > > > > > > put: the key-value message will put into commitlog first, and > then > > > > > through > > > > > > the `reputService` redo commitlog, the key-value will put to > > rocksdb > > > > > > asynchronous, until this reput finished broker will not response > to > > > > > client. > > > > > > get: application will get key-value from rocksdb thought broker > > > > directly. > > > > > > In addition, if we don't want introduce [rocksdb]( > > > > > > https://github.com/facebook/rocksdb) and the meta data content > > will > > > > not > > > > > > occupy too many memory, we can also use a key-value store base on > > > > memory > > > > > > map, there will a periodic serialization and persistence thread > to > > > > > > guarantee data won't loss if broker restart or system abnormal > > > > shutdown, > > > > > > and the memory state consistency will also guaranteed by redo > > > rocketmq > > > > > > commitlog. > > > > > > ### Interface Design/Change > > > > > > - Method signature changes. **No** > > > > > > - Method behavior changes. **No** > > > > > > - CLI command changes. **No** > > > > > > - Log format or content changes. > > > > > > the properties of the message will add two flag, `kv_opType` > > > > indicate > > > > > > the request type is put key-value or get key-value, and `key` > > > indicate > > > > > the > > > > > > request key both in put or get operation. In order to pass the > key > > > > > through > > > > > > the network in the request header, we will encode/decode the > > key(byte > > > > > array > > > > > > format) use [base64]( > > > > > > https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) > > > > > > encoding method. > > > > > >  > > > > > > ### Compatibility, Deprecation, and Migration Plan > > > > > > - Are backward and forward compatibility taken into > consideration? > > > > > > New RequestCode between client and broker are added, so there > > are > > > 2 > > > > > > compatibility situations: > > > > > > 1. old client+new broker: old clients won't make request with > > > > > > key-value flag, so broker will not receive key-value request, > which > > > > keep > > > > > > all things as before. > > > > > > 2. new client+old broker: new clients will send key-value > > > request, > > > > > but > > > > > > the broker don't recognize the request code, and will return > error > > > msg. > > > > > so > > > > > > we should upgrade broker first to support this feature. > > > > > > - Are there deprecated APIs? > > > > > > Nothing specific. > > > > > > - How do we do migration? > > > > > > Nothing specific. > > > > > > ### Implementation Outline > > > > > > We will implement the proposed changes by two phases. > > > > > > #### Phase 1 > > > > > > 1. Implement reput logic from commitlog to rocksdb. > > > > > > 2. Implement broker support key-value request and response. > > > > > > 3. Implement client support key-value request and response. > > > > > > 4. Implement key-value store use memory map. > > > > > > 5. Implement key-value store use rocksdb. > > > > > > #### Phase 2 > > > > > > 1. Implement prefix get semantics. > > > > > > 2. Implement compareAndSet semantics. > > > > > > 3. Implement rocksdb snapshot export/import. > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards :-) > > > > > > > > > > > > > -- > > Best Regards :-) > > >
