[jira] [Updated] (BEAM-14429) SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS

Yichi Zhang (Jira) Mon, 09 May 2022 11:59:54 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yichi Zhang updated BEAM-14429:
-------------------------------
    Description: 
With the default 20 split, the num records produced by 
Read.from(SyntheticUnboundedSource) is always larger than the numRecords 
specified. the more splits the more actual number records produced is off. And 
the Read step tends to take longer time with more splits.

 

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L512

The issue is manifested with java LoadTests on dataflow runner v2.

  was:
With the default 20 split, the num records produced by 
Read.from(SyntheticUnboundedSource) is always larger than the numRecords 
specified. the more splits the more actual number records produced is off. And 
the Read step tends to take longer time with more splits.

 

The issue is manifested with java LoadTests on dataflow runner v2.


> SyntheticUnboundedSource(with SDF) produce duplicate records when split with 
> DEFAULT_DESIRED_NUM_SPLITS
> -------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-14429
>                 URL: https://issues.apache.org/jira/browse/BEAM-14429
>             Project: Beam
>          Issue Type: Bug
>          Components: io-common
>            Reporter: Yichi Zhang
>            Priority: P2
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the default 20 split, the num records produced by 
> Read.from(SyntheticUnboundedSource) is always larger than the numRecords 
> specified. the more splits the more actual number records produced is off. 
> And the Read step tends to take longer time with more splits.
>  
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L512
> The issue is manifested with java LoadTests on dataflow runner v2.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (BEAM-14429) SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS

Reply via email to