[ 
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141019#comment-17141019
 ] 

wangxianghu edited comment on HUDI-340 at 6/20/20, 12:04 PM:
-------------------------------------------------------------

Hi [~Pratyaksh], thanks for the feedback!

yes, the user will never try to set such a source limit, it is just an example 
to state my point that such a huge limit is still configurable(for a test or 
set by mistake), so there is still a chance user can scan the entire Kafka 
topic at ago(we better not assume all our users know the logic).

If the goal is to avoid scanning the entire Kafka topic at ago absolutely, then 
we must eliminate this possibility by setting a hard limit,

or just log a warning when the user sets a limit greater than the default value 
of *maxEventsToReadFromKafka*(the default value is 500w, I think it is big 
enough) which will in turn weaken the goal. I think either way is doable since 
setting such a huge sourceLimt is a low probability event.

But, either way, I don't think we need the check above very much, it will be 
useful only when the user sets the very Long.MAX_VALUE and Integer.MAX_VALUE by 
chance, this looks like a lottery, very low probability.

WDYT ?

cc [~vinoth]

 


was (Author: wangxianghu):
Hi [~Pratyaksh], thanks for the feedback!

yes, the user will never try to set such a source limit, it is just an example 
to state my point that such a huge limit is still configurable(for a test or 
set by mistake), so there is still a chance user can scan the entire Kafka 
topic at ago(we better not assume all our users know the logic).

If the goal is to avoid scanning the entire Kafka topic at ago absolutely, then 
we must eliminate this possibility by setting a hard limit,

or just log a warning when the user sets a limit greater than the default value 
of *maxEventsToReadFromKafka*(the default value is 500w, I think it is big 
enough) which will in turn weaken the goal. I think either way is doable since 
setting such a huge sourceLimt is a low probability event too.

But, either way, I don't think we need the check above very much, it will be 
useful only when the user sets the very Long.MAX_VALUE and Integer.MAX_VALUE by 
chance, this looks like a lottery, very low probability.

WDYT ?

cc [~vinoth]

 

> Increase Default max events to read from kafka source
> -----------------------------------------------------
>
>                 Key: HUDI-340
>                 URL: https://issues.apache.org/jira/browse/HUDI-340
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in 
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming 
> records than this. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to