[ 
https://issues.apache.org/jira/browse/KAFKA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated KAFKA-559:
------------------------------

    Attachment: KAFKA-559.v2.patch

_1. Passing a groupId for cleanup will make the cleanup job tedious since we 
tend to have hundreds of console-consumer group ids in ZK that are stale. 
Running the tool for a particular topic or all topics probably makes more 
sense._
I had received different requirement spec which was based on "group" and not 
"topic". In the new patch, I have added support for topic based deletion too.

_2. I would suggest accepting a date param "mm-dd-yyyy hh:mm:ss,SSS" as a 
String instead of accepting a timestamp value, and deleting the group only if 
it has had no updates to its offsets since that date, as described above._
I had a discussion with Joel about this one and he had suggested me to use the 
EPOCH time thing instead of "mm-dd-yyyy". I am open for modification but if 
there is a consensus about it.

_3. It's dangerous to delete the entire group if the date/"since" is not 
provided. It's very easy for user to specify only two arguments (topic and 
zkconnect) and not specify the date. Let's also make sure that the user always 
specifies a date._
Ok. Change implemented

_4. "dry-run" does not need to accept any value. You can simply use 
parser.accepts("dry-run", "....") and then use if (options.has(dryRunOpt)) 
....._
Agreed. Change implemented

_5. We can inline exitIfNoPathExists, the implementation is small and clear 
enough._
While adding support for topic based deletion, that method had no use so got 
rid of it. 

_6. We should have an info statement when the group ids are deleted in the non 
dry-run mode._
Good catch :) Change implemented

_7. info("Removal has successfully completed.") can probably be refactored to 
something more specific to this tool._
Below is what I changed it to:
logger.info("Kafka obsolete Zk entires cleanup tool shutdown successfully.")

_8. Instead of writing a different info statement for dry-run mode, I think you 
should be able to set logIdent of Logging to "[dry-run]" or "" depending on 
which mode the tool is working in._
Nice one :) Change implemented

Minor stuff:
_1. I think we tend to use camelCase in variable names instead of underscores._
_2. Whitespaces can be made more consistent._

I just read http://kafka.apache.org/coding-guide.html. Will take care of that 
from now on. 

One related question: is there a code formatter for Kafka ?
Creating a fat wiki page with a bunch of project specific formatting is helpful 
but people won't have that in mind everytime they code. 
It gets worse when people work on multiple projects which follow different 
conventions. Here is what some other projects came up with to account for that:
- https://issues.apache.org/jira/browse/HBASE-3678
- 
http://svn.apache.org/viewvc/nutch/branches/2.x/eclipse-codeformat.xml?view=markup
- https://github.com/cloudera/blog-eclipse/blob/master/hadoop-format.xml

Attached KAFKA-559.v2.patch : Implemented the changes suggested by 
[~swapnilghike] and did some sundry refactoring
                
> Garbage collect old consumer metadata entries
> ---------------------------------------------
>
>                 Key: KAFKA-559
>                 URL: https://issues.apache.org/jira/browse/KAFKA-559
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>            Assignee: Tejas Patil
>              Labels: project
>         Attachments: KAFKA-559.v1.patch, KAFKA-559.v2.patch
>
>
> Many use cases involve tranient consumers. These consumers create entries 
> under their consumer group in zk and maintain offsets there as well. There is 
> currently no way to delete these entries. It would be good to have a tool 
> that did something like
>   bin/delete-obsolete-consumer-groups.sh [--topic t1] --since [date] 
> --zookeeper [zk_connect]
> This would scan through consumer group entries and delete any that had no 
> offset update since the given date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to