Re: [Dev] Integrating topics to MB

Hasitha Hiranya Sun, 05 Oct 2014 10:16:03 -0700

Hi,

At broker shutdown all temp queues are deleted.
So all content on that node belonging to them should be removed.


At shutdown if we can delete content of a queue in one call this will be
solved.


On Sun, Oct 5, 2014 at 8:00 PM, Asitha Nanayakkara <[email protected]> wrote:

> Hi Pamod,
>
> Ah yes I agree, Thought you were suggesting for DB operation without any
> content duplication. Yes with per node content duplication we can do a
> content clean up job at start-up and stick with in memory reference
> counting. BTW depending on message count for topics the start-up time will
> vary though.
>
> +1 for in-memory reference counting with content clean up job at startup.
>
> Thanks,
>
> On Sun, Oct 5, 2014 at 2:29 PM, Pamod Sylvester <[email protected]> wrote:
>
>> Hi Asitha,
>>
>> I agree the content should be written before the meat data. What i meant
>> was not having a separate process to do the content clean up rather going
>> with the solution which was proposed by Hasitha where the message count
>> will be maintained in memory instead of the DB.
>>
>> Also if we're going to duplicate both the message content and meta data
>> per node it should not affect, as it was initially mentioned. Instead of
>> duplicating if we're going to share the content among all the nodes then we
>> cannot maintain a local reference anyhow since even if the reference count
>> goes 0 locally there will be other nodes who has subscribers referring to
>> the same content.
>>
>> The solution i proposed was to address the problem of losing the in
>> memory counts at a time where the node gets killed. If a node was killed
>> and the in memory reference was lost when the node will be re started it
>> will first check for the ids which have not being purged through comparison
>> between the mata data and the content and will do the needful.
>>
>> Thanks,
>> Pamod
>>
>>
>> On Sun, Oct 5, 2014 at 1:01 PM, Asitha Nanayakkara <[email protected]>
>> wrote:
>>
>>> Hi Pamod,
>>>
>>> In a clustered  set-up when some other nodes are running. They store
>>> message content for topic first and then store message meta data. This is
>>> not done atomically. While this is happening if we start another node with
>>> a logic to scan the database and delete inconsistent content that will pick
>>> some of the new topic messages that have stored content but in the process
>>> of storing metadata. And the process will delete that content too. And this
>>> will make database having messages with meta data but without any
>>> corresponding content. I think there is a possibility of this happening if
>>> There is a working cluster with topic messages being publish at a higher
>>> rate with high concurrency(publishing) and new node is started at the same
>>> time. Correct me if I'm wrong.
>>>
>>> Yes for each message we will have to store content, metadata and update
>>> the reference count. But we can increment the reference count per message
>>> not per duplicate metadata  (since we know how many duplicates of metadata
>>> we need). If there is a bigger performance hit due to DB update call it's
>>> better to go with in memory approach rather than trying to clean the
>>> content at start-up I guess.
>>>
>>> Thanks.
>>>
>>> On Sun, Oct 5, 2014 at 12:20 PM, Pamod Sylvester <[email protected]> wrote:
>>>
>>>> HI,
>>>>
>>>> How would this approach impact on performance ? this will result in a
>>>> DB operation each time the message is published as well the subscriber acks
>>>> ?
>>>>
>>>> I agree with you on the fact that maintaining the counters in-memory
>>>> could result in message content to be persisted in the DB and have no way
>>>> of deleting them if the node gets killed.
>>>>
>>>> Also what will be the possibility to check the message content which
>>>> needs to be deleted at the start up of the node. Where there should be a
>>>> comparison between the meta data and the content column families, all the
>>>> ids which are in the content table but not in the meta data CF should be
>>>> purged ?
>>>>
>>>> {MessageContentCF} \ {MessageMetaData} = Message Content to be deleted.
>>>>
>>>> this can affect the start up time of the node, but IMO it will not
>>>> affect the performance of the main flows.
>>>>
>>>> WDYT ?
>>>>
>>>> Thanks,
>>>> Pamod
>>>>
>>>> On Sun, Oct 5, 2014 at 11:09 AM, Asitha Nanayakkara <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Hasitha
>>>>>
>>>>> In this if a node with a reference count get killed, all the details
>>>>> regarding reference counts are lost right? Is there a way to delete the
>>>>> content?
>>>>>
>>>>> Btw what if we have the reference count in database. Something similar
>>>>> to what we have for queue message counting now (We create a counter when a
>>>>> queue is created and then increment/ decrement count when messages are
>>>>> received and sent)
>>>>>
>>>>> What I suggest is when a topic message is created we add a reference
>>>>> counter for the message (Via AndesContextStore a new method 
>>>>> createReferenceCounter(long
>>>>> messageID)) when meta data is duplicated we increment the counter
>>>>> when acknowledgment is received we decrement the counter (two methods in
>>>>> context store to increment/decrement counts). And we will have a scheduled
>>>>> task to periodically check the reference count zero messages and delete 
>>>>> the
>>>>> content. This way by creating separate insert statement to create a ref
>>>>> counter and separate statement to update count we can over come writing
>>>>> vendor specific SQL queries for reference counting (For RDBMS). Since the
>>>>> idea is to recommend Cassandra for MessageStore and a RDBMS
>>>>> AndesContextStore we would be better off that way. Plus this will avoid 
>>>>> the
>>>>> need to track reference counts in memory avoiding losing the reference
>>>>> counts when a node gets killed. WDYT?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sun, Oct 5, 2014 at 6:57 AM, Hasitha Hiranya <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>>
>>>>>> Following is my vision on intregating topics to MB
>>>>>>
>>>>>> >> we duplicate metadata per subscriber. It will not create a big
>>>>>> overhead.
>>>>>> >> we do not duplicate content per subscriber, but we duplicate
>>>>>> content per node
>>>>>> >> I hereby assume that we do handle acks for topics. We need a
>>>>>> reasearch on that.
>>>>>>
>>>>>> When a topic subscriber is created
>>>>>> 1. qpid creates a temp queue
>>>>>> 2. qpid creates a binding for that queue to topic exchange using
>>>>>> topic name as binding key.
>>>>>> 3. qpid creates a subscription for the temp queue.
>>>>>>
>>>>>> when a topic subscriber is closed qpid does above 3 things in reverse
>>>>>> order.
>>>>>>
>>>>>> Adhering to this model,
>>>>>>
>>>>>> 1. We store metadata in the same way we use for normal queues.
>>>>>> 2. We use the same SlotDelivery worker and the flusher. There is
>>>>>> NOTHING called topic delivery worker
>>>>>> 3. when show in UI we filter durable ones and show
>>>>>> 4. when a subscriber closes, queue is deleted. We do same thing as
>>>>>> for normal queues.
>>>>>> 5. Whenever we insert metadata, we duplicate metadata for each temp
>>>>>> queue (per subscriber). We know the nodes where subscriers lies, do we 
>>>>>> can
>>>>>> duplicate content for those nodes (one copy for node).
>>>>>> 6. We need to introduce a new tracking per subscriber in on flight
>>>>>> message tracker, which is common for queues as well. when a metadata is
>>>>>> inserted for a message id we increase a count.
>>>>>>     When an ack came for that metadata we decrement it. If it is
>>>>>> zero, content is ready to be removed. we do not track this count globally
>>>>>> as we have a copy of content per node. Thus reference count do not need 
>>>>>> to
>>>>>> be global. It is a local in-memory tracking.
>>>>>> 7. queue change handler - if delete - execute normal delete (remove
>>>>>> all metadata), decrement reference counts. Thread that delete content 
>>>>>> will
>>>>>> detect that and will delete offline. This way only if all subscribers are
>>>>>> gone, content is removed.
>>>>>>
>>>>>> 8. Should be careful abt hierarchical topics. We use our maps to
>>>>>> identify queues bound to a topic. MQTT, AMQP confusion should be solved
>>>>>> there.
>>>>>>
>>>>>> *Thanks *
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Hasitha Abeykoon*
>>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>> *cell:* *+94 719363063*
>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Asitha Nanayakkara*
>>>>> Software Engineer
>>>>> WSO2, Inc. http://wso2.com/
>>>>> Mob: + 94 77 85 30 682
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Pamod Sylvester *
>>>>  *Senior Software Engineer *
>>>> Integration Technologies Team, WSO2 Inc.; http://wso2.com
>>>> email: [email protected] cell: +94 77 7779495
>>>>
>>>
>>>
>>>
>>> --
>>> *Asitha Nanayakkara*
>>> Software Engineer
>>> WSO2, Inc. http://wso2.com/
>>> Mob: + 94 77 85 30 682
>>>
>>>
>>
>>
>> --
>> *Pamod Sylvester *
>>  *Senior Software Engineer *
>> Integration Technologies Team, WSO2 Inc.; http://wso2.com
>> email: [email protected] cell: +94 77 7779495
>>
>
>
>
> --
> *Asitha Nanayakkara*
> Software Engineer
> WSO2, Inc. http://wso2.com/
> Mob: + 94 77 85 30 682
>
>


-- 
*Hasitha Abeykoon*
Senior Software Engineer; WSO2, Inc.; http://wso2.com
*cell:* *+94 719363063*
*blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Integrating topics to MB

Reply via email to