[
https://issues.apache.org/jira/browse/GOBBLIN-1654?focusedWorklogId=775591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775591
]
ASF GitHub Bot logged work on GOBBLIN-1654:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 27/May/22 23:04
Start Date: 27/May/22 23:04
Worklog Time Spent: 10m
Work Description: sv2000 commented on code in PR #3515:
URL: https://github.com/apache/gobblin/pull/3515#discussion_r884023049
##########
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/workunit/packer/KafkaTopicGroupingWorkUnitPacker.java:
##########
@@ -79,6 +80,13 @@ public class KafkaTopicGroupingWorkUnitPacker extends
KafkaWorkUnitPacker {
public static final String CONTAINER_CAPACITY_KEY = GOBBLIN_KAFKA_PREFIX +
"streaming.containerCapacity";
public static final double DEFAULT_CONTAINER_CAPACITY = 10;
+ // minimum container capacity to avoid bad topic schema causing us to
request resources aggressively
+ public static final String MINIMUM_CONTAINER_CAPACITY = GOBBLIN_KAFKA_PREFIX
+ "streaming.minimum.containerCapacity";
+ public static final double DEFAULT_MINIMUM_CONTAINER_CAPACITY = 1;
+ public static final String BAD_TOPIC_PARTITION_WITH_LOW_CAPACITY_EVENT_NAME
= "badTopicPartitionWithLowCapacity";
Review Comment:
badTopicPartitionWithLowCapacity -> topicPartitionWithLowCapacity.
Issue Time Tracking
-------------------
Worklog Id: (was: 775591)
Time Spent: 0.5h (was: 20m)
> Add capacity floor to avoid aggressively requesting resource and small files.
> -----------------------------------------------------------------------------
>
> Key: GOBBLIN-1654
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1654
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Zihan Li
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Now we calculate the capacity based on the consumer rate during peak hour,
> but if there are records with super bad schema during that time, our consumer
> rate will be super low and even after we catch up, we will not be able to
> release the resources because of the low consumer rate. So want provide a
> config to set the minimum consumer rate and avoid abusing resources and small
> files
--
This message was sent by Atlassian Jira
(v8.20.7#820007)