[jira] [Work logged] (GOBBLIN-1654) Add capacity floor to avoid aggressively requesting resource and small files.

ASF GitHub Bot (Jira) Fri, 27 May 2022 16:05:09 -0700


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-1654?focusedWorklogId=775591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775591
 ]


ASF GitHub Bot logged work on GOBBLIN-1654:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/May/22 23:04
            Start Date: 27/May/22 23:04
    Worklog Time Spent: 10m 
      Work Description: sv2000 commented on code in PR #3515:
URL: https://github.com/apache/gobblin/pull/3515#discussion_r884023049


##########
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/workunit/packer/KafkaTopicGroupingWorkUnitPacker.java:
##########
@@ -79,6 +80,13 @@ public class KafkaTopicGroupingWorkUnitPacker extends 
KafkaWorkUnitPacker {
   public static final String CONTAINER_CAPACITY_KEY = GOBBLIN_KAFKA_PREFIX + 
"streaming.containerCapacity";
   public static final double DEFAULT_CONTAINER_CAPACITY = 10;
 
+  // minimum container capacity to avoid bad topic schema causing us to 
request resources aggressively
+  public static final String MINIMUM_CONTAINER_CAPACITY = GOBBLIN_KAFKA_PREFIX 
+ "streaming.minimum.containerCapacity";
+  public static final double DEFAULT_MINIMUM_CONTAINER_CAPACITY = 1;
+  public static final String BAD_TOPIC_PARTITION_WITH_LOW_CAPACITY_EVENT_NAME 
= "badTopicPartitionWithLowCapacity";

Review Comment:
   badTopicPartitionWithLowCapacity -> topicPartitionWithLowCapacity.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 775591)
    Time Spent: 0.5h  (was: 20m)

> Add capacity floor to avoid aggressively requesting resource and small files.
> -----------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1654
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1654
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Zihan Li
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Now we calculate the capacity based on the consumer rate during peak hour, 
> but if there are records with super bad schema during that time, our consumer 
> rate will be super low and even after we catch up, we will not be able to 
> release the resources because of the low consumer rate. So want provide a 
> config to set the minimum consumer rate and avoid abusing resources and small 
> files



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (GOBBLIN-1654) Add capacity floor to avoid aggressively requesting resource and small files.

Reply via email to