[ 
https://issues.apache.org/jira/browse/HUDI-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3469:
----------------------------------
    Description: 
Currently, `HoodieTestDataGenerator` relies on static state which make its 
state shared across all of the tests making data generation dependent on the 
order of execution.

 

Instead we should properly abstract `HoodieTestDataGenerator` to hold all of 
the state w/in individual instances so that individual Tests can:

1. Create they own isolated instance (which won't be affected by other Tests)
2. Pass "seed" value to DataGenerator to init its PRNG w/ it, so that it always 
produces the same (pseudo-)random sequence (for a given seed)
3. Be certain that all of the data produced by DataGenerator will be 100% 
reproducible w/ the same seed (meaning that all of the DataGenerator operations 
w/in it only rely on such internal PRNG and don't rely on any external sources, 
such as `UUID.randomUUID()`, `System.currentTimeMillis()`, etc)

  was:
Currently, `HoodieTestDataGenerator` relies on static state which make its 
state shared across all of the tests making data generation dependent on the 
order of execution.

 

Instead we should properly abstract `HoodieTestDataGenerator` to hold all of 
the state w/in individual instances so that individual tests can
 # Create they own isolated instance (which won't be affected by other Tests)
 # Accept "seed" value for its PRNG so that it always produces the same random 
sequence (for a given seed)
 # All of the operations w/in it only rely on such internal PRNG and don't rely 
on any external sources (such as `UUID.randomUUID()`, 
`System.currentTimeMillis()`, etc)


> Refactor HoodieTestDataGenerator to enable reproducible builds
> --------------------------------------------------------------
>
>                 Key: HUDI-3469
>                 URL: https://issues.apache.org/jira/browse/HUDI-3469
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Currently, `HoodieTestDataGenerator` relies on static state which make its 
> state shared across all of the tests making data generation dependent on the 
> order of execution.
>  
> Instead we should properly abstract `HoodieTestDataGenerator` to hold all of 
> the state w/in individual instances so that individual Tests can:
> 1. Create they own isolated instance (which won't be affected by other Tests)
> 2. Pass "seed" value to DataGenerator to init its PRNG w/ it, so that it 
> always produces the same (pseudo-)random sequence (for a given seed)
> 3. Be certain that all of the data produced by DataGenerator will be 100% 
> reproducible w/ the same seed (meaning that all of the DataGenerator 
> operations w/in it only rely on such internal PRNG and don't rely on any 
> external sources, such as `UUID.randomUUID()`, `System.currentTimeMillis()`, 
> etc)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to