zl created FLINK-26548:
--------------------------
Summary: the source parallelism is not set correctly with
AdaptiveBatchScheduler
Key: FLINK-26548
URL: https://issues.apache.org/jira/browse/FLINK-26548
Project: Flink
Issue Type: Bug
Components: Runtime / Task
Affects Versions: 1.15.0
Reporter: zl
Attachments: image-2022-03-09-19-00-18-396.png
When running *_org.apache.flink.table.tpcds.TpcdsTestProgram_* with
{_}*AdaptiveBatchScheduler*{_}, I ran into a problem:the num of records sent by
the source operator is always 1, and the parallelism of source operator is also
1 even I set *_jobmanager.adaptive-batch-scheduler.default-source-parallelism_*
to 8.
!image-2022-03-09-19-00-18-396.png!
After some research, I found that the operator A is not the actual file reader,
it just splits files and assigns splits to downstream tasks for further
processing, and the operator B is the actual file reader task. Here, the
parallelism of operator B is 64, and the records sent by operator A is 1, this
means, operator A assigned all splits to a task of operator B, {*}_the other 63
tasks of operator B is idle_{*}, it is unreasonable.
In this case, the parallelism of operator B should be
*_jobmanager.adaptive-batch-scheduler.default-source-parallelism_* and the num
of records sent by operator A also should be
{*}_jobmanager.adaptive-batch-scheduler.default-source-parallelism_{*}.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)