[ 
https://issues.apache.org/jira/browse/SAMZA-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936967#comment-14936967
 ] 

Edi Bice commented on SAMZA-679:
--------------------------------

Aha maybe this is what's been plaguing my job! I have a job which uses 
job.coordinator.system=kafka and now the respective folder 
(__samza_coordinator_my-topic) under kafka-logs is about 1.3Gb. I made some 
code and configuration changes to the job, killed it and have been trying to 
launch it. Was surprised to see OutOfMemory errors with some very large heap 
settings and was wondering why it was consuming so much memory. Here are some 
of the __samza_coordinator_my-topic settings:
ReplicationFactor:3     
Configs:segment.bytes=26214400,retention.ms=3600000,cleanup.policy=compact

> Optimize CoordinatorStream's bootstrap mechanism
> ------------------------------------------------
>
>                 Key: SAMZA-679
>                 URL: https://issues.apache.org/jira/browse/SAMZA-679
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Naveen Somasundaram
>             Fix For: 0.10.0
>
>
> At present, when the bootstrap using the CoordinatorStreamConsumer, we read 
> all the messages into a set. Which is fine, if log compaction is working, but 
> given that:
> 1. The log compaction can be turned off/broken for whatever reason
> 2. The is time interval between compaction
> We should consider fixing the bootstrap method to hold only the latest 
> checkpoint (Override equals and hascode of the set is one way to go about it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to