[jira] [Commented] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound

Flink Jira Bot (Jira) Thu, 29 Apr 2021 16:46:38 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17336543#comment-17336543
 ]


Flink Jira Bot commented on FLINK-11050:
----------------------------------------

This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> When IntervalJoin, get left or right buffer's entries more quickly by 
> assigning lowerBound
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11050
>                 URL: https://issues.apache.org/jira/browse/FLINK-11050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.6.2, 1.7.0
>            Reporter: Liu
>            Priority: Major
>              Labels: performance, pull-request-available, stale-major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>     When IntervalJoin, it is very slow to get left or right buffer's entries. 
> Because we have to scan all buffer's values, including the deleted values 
> which are out of time range. These deleted values's processing consumes too 
> much time in RocksDB's level 0. Since lowerBound is known, it can be 
> optimized by seek from the timestamp of lowerBound.
>     Our usage is like below:
> {code:java}
> labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid))
>            .between(Time.milliseconds(0), Time.milliseconds(600000))
>            .process(new processFunction())
>            .sink(kafkaProducer)
> {code}
>     Our data is huge. The job always runs for an hour and is stuck by 
> RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate 
> the problem RocksDB and find that it takes too much time in deleted values. 
> So we decide to optimize it by assigning the lowerBound instead of global 
> search.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound

Reply via email to