Hi Zhanhui, I have a doubt about these multiple keys. If I am wrong in any of the assumptions I make please point it out.
If there is support for multiple keys I cannot see this in the code. The class Message only stores a single key in the property map against the property name "KEYS". Is this also done in the same ways as tags? That is different keys are separated with ' || '? So basically as a user of the producer API it is the user's responsibility to ensure that he separates the different keys with the correct separator. I can see an obvious problem here. What if the key contains this special character ' || '? But maybe this event is rare and hence this is not important. Could you point me to some source/doc that explains this part? I was looking at the index section rocketmq-store but I have not been able to understand the indexing process completely for now. I will keep reading the source to get a better idea. Moving on to the implementational details. Here is a broad idea of one possible way to approach it. The attempt is to remove duplicate messages. In this issue, I would like to aim at eliminating duplicate messages at the producer/broker end. For now, we do not concern ourselves with the duplicate messages happening due to unwritten consumer offsets as these two issues have different solutions. One way to solve this problem at the producer/broker end could be to have a distributed key store that stores the messages. We can make it configurable such that this distributed store stores all messages or works as a sliding window keeping only the messages from the last X seconds specified by the user. We can have a layer on top to check set membership such as a bloom filter or a cuckoo filter ( https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) to help performance. Every message being pushed in by a producer are checked in first with the filter and in case of a positive result with this key store. If the message is found then it is discarded. This helps remove duplicates completely from a producer perspective. The core of this idea is the distributed key store which would be completely separate from the current message storage. Since the concept of a distributed key store or a key/value store is not novel there are two ways to this. 1. Implement it ourselves. This would be high effort but no external dependencies. 2. Use a key-value store such as Redis (which already has timeouts and persistence but a large memory footprint) or some other disk-based storage for set membership. This would include an external dependency but development time will reduce significantly for such a solution. I am inclined towards implementing it by myself as this would avoid dependencies on other products especially since RocketMQ is currently a self-reliant system. In addition, my past experience with building such a store should also come in handy. I would like to know the opinions of the development community on this approach and to suggest improvements on it. Looking forward to your responses to this. ====<question unrelated to issue>===== To increase my familiarity with the code base and to help prove that I am familiar with the tools and technologies in place it would be great if I could be pointed to some low effort issues that I could help out with. In case there are no 'newbie' issues available I could help improve the comments inside the codebase. I noticed some source files with no explanations which can be documented via comments to help onboard a new contributor faster. ====</question unrelated to issue>===== Thanks a lot for reading this through and looking forward to your opinions. Regards, Sohaib On Sat, Feb 24, 2018 at 11:50 AM, Zhanhui Li <lizhan...@gmail.com> wrote: > Hi Sohaib, > > Happy to know you are interested in RocketMQ. > > First, let me answer questions you raised. > > — can there be multiple tags? > No. At present, the storage engine allows single tag only. Subscriptions > are allowed to use combination of tags. The current model should meet your > business development. If not, please let us know. > > > — key (Similar question to above.) > RocketMQ builds index using message keys. A single message may have > multiple keys. > > — About redundant message > From my understanding, you are trying to eliminate duplicate messages. > True there are various reasons which may cause message duplication, ranging > from message delivery and consumption. Discussion on this topic is warmly > welcome. Had you had any idea to contribute on this issue, the developer > board is happy to discuss. > > Zhanhui Li > > > > > > 在 2018年2月24日,上午11:17,Sohaib Iftikhar <sohaib1...@gmail.com> 写道: > > > > My earlier email message seems to have gotten lost. So I will try again. > > Please see the original message for the discussion. > > > > Regards, > > Sohaib Iftikhar > > > > -- Man is still the most extraordinary computer of all.-- > > > > On Tue, Feb 20, 2018 at 1:54 AM, Sohaib Iftikhar <sohaib1...@gmail.com> > > wrote: > > > >> Hi, > >> > >> I am interested in working on this issue (https://issues.apache.org/ > >> jira/browse/ROCKETMQ-124) as part of GSOC-18. I have a few questions for > >> the same. I am not sure if this discussion needs to be on the JIRA > issue or > >> here. Feel free to correct me if this is the wrong platform. Also while > I > >> have worked with distributed pub-sub systems I am still fairly new to > >> Rocket-MQ so maybe my understanding of it is incorrect. I apologise if > that > >> is the case and would be happy to stand corrected. > >> > >> Following are my questions: > >> 1. What defines a redundant message? > >> The constructor that I see for a message is as follows: > >> Message(String topic, String tags, String keys, int flag, byte[] > body, > >> boolean waitStoreMsgOK) > >> Possible candidates to me are topic, tags (can there be multiple > tags? > >> I could not find an example for this. If yes how are they separated?), > keys > >> (Similar question to above.) and of course the body. Is there something > >> that I have missed in this? Is there something that we do not need to > >> consider? > >> 2. Is their a timeline on the redundant messages? What I mean by this is > >> that is there a time limit after which a message with similar content is > >> allowed. From what I gather there was no such thing mentioned. This > would > >> mean storing all the messages. Depending on the requirements this may or > >> may not be the best solution. It might be desirable that no duplicates > are > >> needed within a certain time window (sliding). This allows ignoring of > >> duplicate messages that were generated very close to each other (or in > the > >> window indicated). Depending on this requirement implementation may > become > >> a little bit more involved. > >> > >> For now, these are the only questions. I have ideas that need review > about > >> possible implementations but I will mention them once the specifications > >> are clear to me. As an end question, I would at some point like to post > >> design ideas to this problem privately to get it reviewed by the > >> development community but not make it publicly available so that it > cannot > >> be plagiarised. What platform/method can I use to do that? Or is > submitting > >> a draft to the Google platform the only possible way to accomplish this? > >> > >> Thanks a lot for reading this through and looking forward to your > inputs. > >> > >> Regards, > >> Sohaib Iftikhar > >> > >