[ 
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976065#comment-16976065
 ] 

Vinoth Chandar commented on HUDI-340:
-------------------------------------

My only concern was `sourceLimit` 's default is `Long.MAX_VALUE` ?  So 
Math.max(MAX_EVENTS_TO_READ, sourceLimit) would scan the entire Kafka topic by 
default? This was what I was trying to avoid, since Kafka might not like it and 
users might think Hudi does not work

How about the following proposal? 
- Make the upper cap configurable for KafkaSource, defaults to 10M or 20M, 
something much higher than this
- If sourceLimit == Long.MAX_VALUE alone, we use this upper safety cap, 
otherwise respect `sourceLimit` 

> Increase Default max events to read from kafka source
> -----------------------------------------------------
>
>                 Key: HUDI-340
>                 URL: https://issues.apache.org/jira/browse/HUDI-340
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in 
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming 
> records than this. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to