[
https://issues.apache.org/jira/browse/FLINK-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-16057:
--------------------------------------
Description:
After switching CFRO to a single-threaded execution model performance
regression was expected to be about 15-20% (benchmarked in November).
But after merging to master it turned out to be about 50%.
One reason is that the chaining strategy isn't set by default in CFRO factory.
Without that even reading and outputting all records of a split in a single
mail action doesn't reverse the regression (only about half).
However, with strategy set AND batching enabled fixes the regression (starting
from batch size 6).
Though batching can't be used in practice because it can significantly delay
checkpointing.
Another approach would be to process one record and the repeat until
defaultMailboxActionAvailable OR haveNewMail.
This reverses regression and even improves the performance by about 50%
compared to the old version.
The final solution could also be FLIP-27.
Other things tried (didn't help):
* CFRO rework without subsequent commits (removing checkpoint lock)
* different batch sizes, including the whole split, without chaining strategy
fixed - partial improvement only
* disabling close
* disabling checkpointing
* disabling output (serialization)
* using LinkedList instead of PriorityQueue
was:
After switching to a single-threaded execution model performance regression was
expected to be about 15-20% (benchmarked in November).
After merging it turned out to be about 50%.
One reason is that the chaining strategy isn't set by default in CFRO factory.
Without this even reading and outputting all records of a split in a single
mail action doesn't reverse the regression (only about half).
However, with strategy set AND batching enabled fixes the regression (starting
from batch size 6).
Though batching can't be used in practice because it can significantly delay
checkpointing.
Another approach would be to process one record and the repeat until
defaultMailboxActionAvailable OR haveNewMail.
This reverses regression and even improves the performance by about 50%
compared to the old version.
The final solution could also be FLIP-27.
Other things tried (didn't help):
* CFRO rework without subsequent commits (removing checkpoint lock)
* different batch sizes, including the whole split, without chaining strategy
fixed - partial improvement only
* disabling close
* disabling checkpointing
* disabling output (serialization)
* using LinkedList instead of PriorityQueue
> Performance regression in ContinuousFileReaderOperator
> ------------------------------------------------------
>
> Key: FLINK-16057
> URL: https://issues.apache.org/jira/browse/FLINK-16057
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Affects Versions: 1.11.0
> Reporter: Roman Khachatryan
> Priority: Blocker
>
> After switching CFRO to a single-threaded execution model performance
> regression was expected to be about 15-20% (benchmarked in November).
> But after merging to master it turned out to be about 50%.
>
> One reason is that the chaining strategy isn't set by default in CFRO factory.
> Without that even reading and outputting all records of a split in a single
> mail action doesn't reverse the regression (only about half).
> However, with strategy set AND batching enabled fixes the regression
> (starting from batch size 6).
> Though batching can't be used in practice because it can significantly delay
> checkpointing.
>
> Another approach would be to process one record and the repeat until
> defaultMailboxActionAvailable OR haveNewMail.
> This reverses regression and even improves the performance by about 50%
> compared to the old version.
>
> The final solution could also be FLIP-27.
>
> Other things tried (didn't help):
> * CFRO rework without subsequent commits (removing checkpoint lock)
> * different batch sizes, including the whole split, without chaining
> strategy fixed - partial improvement only
> * disabling close
> * disabling checkpointing
> * disabling output (serialization)
> * using LinkedList instead of PriorityQueue
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)