[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197842#comment-13197842
 ] 

Sijie Guo commented on BOOKKEEPER-154:
--------------------------------------

thanks, Flavio & Ivan.

I think the application thread you mentioned would be a tool like bookkeeper 
recovery tool. it would not be run very frequently. it could be executed as a 
cron job, running every several days.

> the last time each of those subscribers has consumed a message.

I think using the modify time of subscription znode as last time is easiest 
way. for clock out of sync issue, either using time in zk or in hub doesn't 
solve it. for consistency issue, since the tool just uses modify time to judge 
a subscriber is offline for a long time, it would not modify ZooKeeper metadata 
directly.

> the subscribers it needs to watch for

similar issue as bookkeeper recovery tool. it needs to loop over all ledgers to 
check and do recovery. it use zk#getChildren to fetch all ledgers. (in 
BOOKKEEPER-39 , we add a hierarchical ledger manager to avoid fetching too many 
children in a single zk#getChildren)

The panic here is that we put all topics metadata in a single znode. it is not 
easy for application to retrieve the topic list where there is huge number of 
topics. a possible solution is to support hierarchical topic to let application 
organize their topics, but it may be another jira to handle it.

the easiest way is similar as previous comment described, which gc tool doesn't 
need to care about it, and the application passes a gc list to it. ($ gc_tool 
--topics topic_list ; or $ gc_tool -f topic_list_file) I think it would be 
easier for application to get such kind of list.
                
> Garbage collect messages for those subscribers inactive/offline for a long 
> time. 
> ---------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-154
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-154
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: hedwig-client, hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>
> Currently hedwig tracks subscribers progress for garbage collecting published 
> messages. If subscriber subscribe and becomes offline without unsubscribing 
> for a long time, those messages published in its topic have no chance to be 
> garbage collected.
> A time based garbage collection policy would be suitable for this case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to