[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound

Flink Jira Bot (Jira) Tue, 30 Nov 2021 14:40:00 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot updated FLINK-11050:
-----------------------------------
      Labels: auto-deprioritized-major auto-deprioritized-minor performance 
pull-request-available  (was: auto-deprioritized-major performance 
pull-request-available stale-minor)
    Priority: Not a Priority  (was: Minor)

This issue was labeled "stale-minor" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Minor, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> When IntervalJoin, get left or right buffer's entries more quickly by 
> assigning lowerBound
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11050
>                 URL: https://issues.apache.org/jira/browse/FLINK-11050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.6.2, 1.7.0
>            Reporter: Liu
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>     When IntervalJoin, it is very slow to get left or right buffer's entries. 
> Because we have to scan all buffer's values, including the deleted values 
> which are out of time range. These deleted values's processing consumes too 
> much time in RocksDB's level 0. Since lowerBound is known, it can be 
> optimized by seek from the timestamp of lowerBound.
>     Our usage is like below:
> {code:java}
> labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid))
>            .between(Time.milliseconds(0), Time.milliseconds(600000))
>            .process(new processFunction())
>            .sink(kafkaProducer)
> {code}
>     Our data is huge. The job always runs for an hour and is stuck by 
> RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate 
> the problem RocksDB and find that it takes too much time in deleted values. 
> So we decide to optimize it by assigning the lowerBound instead of global 
> search.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound

Reply via email to