陈磊 created FLINK-36661:
--------------------------

             Summary: the managed memory setting is relatively small, resulting 
in duplicate processing results for batch tasks
                 Key: FLINK-36661
                 URL: https://issues.apache.org/jira/browse/FLINK-36661
             Project: Flink
          Issue Type: Bug
            Reporter: 陈磊
         Attachments: image-2024-11-05-15-45-44-349.png

1. Operating environment:
Flinksql 1.16+Batch Task+TM Specification (4c 8g)

2. Core topology diagram & SQL frame:
!image-2024-11-05-15-45-44-349.png!


{code:sql}
create view tmp1
select ....
from source
group by xx, xx;

create view tmp
select ...
from tmp1 a
left join
tmp1 b
on a.xx = b.xx
group by x, y, ....;

insert into t
select * from tmp;
{code}



3. Problem triggering conditions
The managed fraction is set to 0.1, and the calculated managed memory is 
approximately 600M

4. Result performance
The correct output count for this task is 3828w
When the managed fraction is 0.1, the output of the written result is unstable, 
which may be 120 million data points, 150 million data points, or 210 million 
data points.
When the managed fraction is set to 0.4, the task output is stable and meets 
expectations

5. Personal investigation & think
1)Through monitoring verification, it was found that when managed is set to 
0.1, the output parameters of sort significantly increase.
2)If the managed fraction is not adjusted, expanding TM memory can also ensure 
stable output of technical results
3)Although managed memory is important for batch tasks, in situations where 
managed memory is insufficient, it should be due to slow task execution or OOM, 
rather than causing duplicate output data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to