[
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976065#comment-16976065
]
Vinoth Chandar commented on HUDI-340:
-------------------------------------
My only concern was `sourceLimit` 's default is `Long.MAX_VALUE` ? So
Math.max(MAX_EVENTS_TO_READ, sourceLimit) would scan the entire Kafka topic by
default? This was what I was trying to avoid, since Kafka might not like it and
users might think Hudi does not work
How about the following proposal?
- Make the upper cap configurable for KafkaSource, defaults to 10M or 20M,
something much higher than this
- If sourceLimit == Long.MAX_VALUE alone, we use this upper safety cap,
otherwise respect `sourceLimit`
> Increase Default max events to read from kafka source
> -----------------------------------------------------
>
> Key: HUDI-340
> URL: https://issues.apache.org/jira/browse/HUDI-340
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: deltastreamer
> Reporter: Pratyaksh Sharma
> Assignee: Pratyaksh Sharma
> Priority: Major
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming
> records than this.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)