[
https://issues.apache.org/jira/browse/FLINK-26307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
郭元方 updated FLINK-26307:
------------------------
Description:
对于底层实现窗口时,求其开启窗口起始时间的方法有一个问题:
public static long getWindowStartWithOffset(long timestamp, long offset, long
windowSize)
{ return timestamp - (timestamp - offset + windowSize) % windowSize;
}
例如:窗口大小为10,事件时间分别为1s,7s,12s得到的起始时间,显然正确。
=> timestamp - (timestamp - offset + windowSize) % windowSize
=> 1 - (1 -0 + 10) % 10 => WindowStart = 0s
=> 7 - (7 -0 + 10) % 10 => WindowStart = 0s
=> 12 - (12 - 0 + 10) % 10 => WindowStart = 10s
但是,当事件时间为
-1s,-7s,-12s,-15s,-17s,-19s时,窗口的起始时间却也都为-10s,事件时间为-21s,-27s,窗口起始时间为-20s
是否由于在java中的取余规则,正数对正数取余的商值向0逼近,而负数对正数取余的商值也向0逼近导致该算法
对事件时间在(Long.MIN_VALUE,-windowsize)的这个范围内的数据不能正确的求到开启窗口的起始时间呢?
=> timestamp - (timestamp - offset + windowSize) % windowSize
=> -1 - (-1 -0 + 10) % 10 => WindowStart = -10s
=> -7 - (-7 -0 + 10) % 10 => WindowStart = -10s
=> -12 - (-12 - 0 + 10) % 10 => WindowStart = -10s
=> -15 - (-15 - 0 + 10) % 10 => WindowStart = -10s
=> -17 - (-17 - 0 + 10) % 10 => WindowStart = -10s
=> -19 - (-19 - 0 + 10) % 10 => WindowStart = -10s
=> -21 - (-21 - 0 + 10) % 10 =>WindowStart = -20s
=> -27 - (-27 - 0 + 10) % 10 => WindowStart = -20s
{code:java}
//代码占位符
env
.fromElements(
new LoginEvent("user-1", "fail", -7L),
new LoginEvent("user-2", "success", -23L)
)
.assignTimestampsAndWatermarks(WatermarkStrategy.<LoginEvent>forMonotonousTimestamps()
.withTimestampAssigner(new
SerializableTimestampAssigner<LoginEvent>() {
@Override
public long extractTimestamp(LoginEvent element, long
recordTimestamp) {
return element.ts;
}
}))
.keyBy(r -> r.userId)
.window(SlidingEventTimeWindows.of(Time.milliseconds(10),Time.milliseconds(5)))
.process(
new ProcessWindowFunction<LoginEvent, String, String,
TimeWindow>() {
@Override
public void process(String s, Context context,
Iterable<LoginEvent> elements, Collector<String> out) throws Exception {
out.collect(
"窗口" + context.window().getStart() + "~" +
context.window().getEnd() + "中的元素为:" +
elements.iterator().next().ts
);
}
}
)
.print();
env.execute();
执行结果如下:
窗口-30~-20中的元素为:-23
窗口-25~-15中的元素为:-23
窗口-20~-10中的元素为:-23
窗口-15~-5中的元素为:-7
窗口-10~0中的元素为:-7
窗口-5~5中的元素为:-7
结论:
显然-7 不在 -5~5的窗口,-23不在-10~-20的窗口。{code}
和同学邓子琦的讨论,并没有经历过业务,所以此处事件时间求到了负值,仅从逻辑上来讲,事件时间应该可以为负值吧。
求大佬解答,该算法算不算有逻辑上的漏洞。
was:
对于底层实现窗口时,求其开启窗口起始时间的方法有一个问题:
public static long getWindowStartWithOffset(long timestamp, long offset, long
windowSize)
{ return timestamp - (timestamp - offset + windowSize) % windowSize;
}
例如:窗口大小为10,事件时间分别为1s,7s,12s得到的起始时间,显然正确。
=> timestamp - (timestamp - offset + windowSize) % windowSize
=> 1 - (1 -0 + 10) % 10 => WindowStart = 0s
=> 7 - (7 -0 + 10) % 10 => WindowStart = 0s
=> 12 - (12 - 0 + 10) % 10 => WindowStart = 10s
但是,当事件时间为
-1s,-7s,-12s,-15s,-17s,-19s时,窗口的起始时间却也都为-10s,事件时间为-21s,-27s,窗口起始时间为-20s
是否由于在java中的取余规则,正数对正数取余的商值向0逼近,而负数对正数取余的商值也向0逼近导致该算法
对事件时间在(Long.MIN_VALUE,-windowsize)的这个范围内的数据不能正确的求到开启窗口的起始时间呢?
=> timestamp - (timestamp - offset + windowSize) % windowSize
=> -1 - (-1 -0 + 10) % 10 => WindowStart = -10s
=> -7 - (-7 -0 + 10) % 10 => WindowStart = -10s
=> -12 - (-12 - 0 + 10) % 10 => WindowStart = -10s
=> -15 - (-15 - 0 + 10) % 10 => WindowStart = -10s
=> -17 - (-17 - 0 + 10) % 10 => WindowStart = -10s
=> -19 - (-19 - 0 + 10) % 10 => WindowStart = -10s
=> -21 - (-21 - 0 + 10) % 10 =>WindowStart = -20s
=> -27 - (-27 - 0 + 10) % 10 => WindowStart = -20s
和同学邓子琦的讨论,并没有经历过业务,所以此处事件时间求到了负值,仅从逻辑上来讲,事件时间应该可以为负值吧。
求大佬解答,该算法算不算有逻辑上的漏洞。
> 当事件时间在(Long.MIN_VALUE ,-windowsize)时,不能正确的求到开启窗口的起始时间
> -----------------------------------------------------
>
> Key: FLINK-26307
> URL: https://issues.apache.org/jira/browse/FLINK-26307
> Project: Flink
> Issue Type: Improvement
> Components: API / DataStream
> Affects Versions: 1.13.6
> Reporter: 郭元方
> Priority: Major
> Labels: 事件时间, 水位线, 窗口
>
> 对于底层实现窗口时,求其开启窗口起始时间的方法有一个问题:
> public static long getWindowStartWithOffset(long timestamp, long offset, long
> windowSize)
> { return timestamp - (timestamp - offset + windowSize) % windowSize;
> }
> 例如:窗口大小为10,事件时间分别为1s,7s,12s得到的起始时间,显然正确。
> => timestamp - (timestamp - offset + windowSize) % windowSize
> => 1 - (1 -0 + 10) % 10 => WindowStart = 0s
> => 7 - (7 -0 + 10) % 10 => WindowStart = 0s
> => 12 - (12 - 0 + 10) % 10 => WindowStart = 10s
>
> 但是,当事件时间为
> -1s,-7s,-12s,-15s,-17s,-19s时,窗口的起始时间却也都为-10s,事件时间为-21s,-27s,窗口起始时间为-20s
> 是否由于在java中的取余规则,正数对正数取余的商值向0逼近,而负数对正数取余的商值也向0逼近导致该算法
> 对事件时间在(Long.MIN_VALUE,-windowsize)的这个范围内的数据不能正确的求到开启窗口的起始时间呢?
> => timestamp - (timestamp - offset + windowSize) % windowSize
> => -1 - (-1 -0 + 10) % 10 => WindowStart = -10s
> => -7 - (-7 -0 + 10) % 10 => WindowStart = -10s
> => -12 - (-12 - 0 + 10) % 10 => WindowStart = -10s
> => -15 - (-15 - 0 + 10) % 10 => WindowStart = -10s
> => -17 - (-17 - 0 + 10) % 10 => WindowStart = -10s
> => -19 - (-19 - 0 + 10) % 10 => WindowStart = -10s
> => -21 - (-21 - 0 + 10) % 10 =>WindowStart = -20s
> => -27 - (-27 - 0 + 10) % 10 => WindowStart = -20s
> {code:java}
> //代码占位符
> env
> .fromElements(
> new LoginEvent("user-1", "fail", -7L),
> new LoginEvent("user-2", "success", -23L)
> )
>
> .assignTimestampsAndWatermarks(WatermarkStrategy.<LoginEvent>forMonotonousTimestamps()
> .withTimestampAssigner(new
> SerializableTimestampAssigner<LoginEvent>() {
> @Override
> public long extractTimestamp(LoginEvent element, long
> recordTimestamp) {
> return element.ts;
> }
> }))
> .keyBy(r -> r.userId)
>
> .window(SlidingEventTimeWindows.of(Time.milliseconds(10),Time.milliseconds(5)))
> .process(
> new ProcessWindowFunction<LoginEvent, String, String,
> TimeWindow>() {
> @Override
> public void process(String s, Context context,
> Iterable<LoginEvent> elements, Collector<String> out) throws Exception {
> out.collect(
> "窗口" + context.window().getStart() + "~" +
> context.window().getEnd() + "中的元素为:" +
> elements.iterator().next().ts
> );
> }
> }
> )
> .print();
> env.execute();
> 执行结果如下:
> 窗口-30~-20中的元素为:-23
> 窗口-25~-15中的元素为:-23
> 窗口-20~-10中的元素为:-23
> 窗口-15~-5中的元素为:-7
> 窗口-10~0中的元素为:-7
> 窗口-5~5中的元素为:-7
> 结论:
> 显然-7 不在 -5~5的窗口,-23不在-10~-20的窗口。{code}
>
> 和同学邓子琦的讨论,并没有经历过业务,所以此处事件时间求到了负值,仅从逻辑上来讲,事件时间应该可以为负值吧。
> 求大佬解答,该算法算不算有逻辑上的漏洞。
--
This message was sent by Atlassian Jira
(v8.20.1#820001)