[jira] [Updated] (FLINK-16057) Performance regression in ContinuousFileReaderOperator

Roman Khachatryan (Jira) Fri, 14 Feb 2020 02:36:12 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Khachatryan updated FLINK-16057:
--------------------------------------
    Description: 
After switching CFRO to a single-threaded execution model performance 
regression was expected to be about 15-20% (benchmarked in November).

But after merging to master it turned out to be about 50%.

  

One reason is that the chaining strategy isn't set by default in CFRO factory.

Without that even reading and outputting all records of a split in a single 
mail action doesn't reverse the regression (only about half).

However,  with strategy set AND batching enabled fixes the regression (starting 
from batch size 6).

Though batching can't be used in practice because it can significantly delay 
checkpointing.

 

Another approach would be to process one record and the repeat until 
defaultMailboxActionAvailable OR haveNewMail.

This reverses regression and even improves the performance by about 50% 
compared to the old version.

 

The final solution could also be FLIP-27.

 

Other things tried (didn't help):
 * CFRO rework without subsequent commits (removing checkpoint lock)
 * different batch sizes, including the whole split, without chaining strategy 
fixed - partial improvement only
 * disabling close
 * disabling checkpointing
 * disabling output (serialization)
 * using LinkedList instead of PriorityQueue

 

  was:
After switching to a single-threaded execution model performance regression was 
expected to be about 15-20% (benchmarked in November).

After merging it turned out to be about 50%.

 

 

One reason is that the chaining strategy isn't set by default in CFRO factory.

Without this even reading and outputting all records of a split in a single 
mail action doesn't reverse the regression (only about half).

However,  with strategy set AND batching enabled fixes the regression (starting 
from batch size 6).

 

Though batching can't be used in practice because it can significantly delay 
checkpointing.

Another approach would be to process one record and the repeat until 
defaultMailboxActionAvailable OR haveNewMail.

This reverses regression and even improves the performance by about 50% 
compared to the old version.

 

The final solution could also be FLIP-27.

 

Other things tried (didn't help):
 * CFRO rework without subsequent commits (removing checkpoint lock)
 * different batch sizes, including the whole split, without chaining strategy 
fixed - partial improvement only
 * disabling close
 * disabling checkpointing
 * disabling output (serialization)
 * using LinkedList instead of PriorityQueue

 


> Performance regression in ContinuousFileReaderOperator
> ------------------------------------------------------
>
>                 Key: FLINK-16057
>                 URL: https://issues.apache.org/jira/browse/FLINK-16057
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.11.0
>            Reporter: Roman Khachatryan
>            Priority: Blocker
>
> After switching CFRO to a single-threaded execution model performance 
> regression was expected to be about 15-20% (benchmarked in November).
> But after merging to master it turned out to be about 50%.
>   
> One reason is that the chaining strategy isn't set by default in CFRO factory.
> Without that even reading and outputting all records of a split in a single 
> mail action doesn't reverse the regression (only about half).
> However,  with strategy set AND batching enabled fixes the regression 
> (starting from batch size 6).
> Though batching can't be used in practice because it can significantly delay 
> checkpointing.
>  
> Another approach would be to process one record and the repeat until 
> defaultMailboxActionAvailable OR haveNewMail.
> This reverses regression and even improves the performance by about 50% 
> compared to the old version.
>  
> The final solution could also be FLIP-27.
>  
> Other things tried (didn't help):
>  * CFRO rework without subsequent commits (removing checkpoint lock)
>  * different batch sizes, including the whole split, without chaining 
> strategy fixed - partial improvement only
>  * disabling close
>  * disabling checkpointing
>  * disabling output (serialization)
>  * using LinkedList instead of PriorityQueue
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-16057) Performance regression in ContinuousFileReaderOperator

Reply via email to