[ 
https://issues.apache.org/jira/browse/BEAM-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475469#comment-17475469
 ] 

Evan Galpin commented on BEAM-13137:
------------------------------------

[~echauchot] I'm wondering if the flakiness could be to do with non-determinism 
of whether data has been flushed from the translog prior to making a size 
estimate.  That's not something I'd want to introduce into the core 
functionality of the IO because I wouldn't want to alter what's happening in a 
user's ES cluster, but if it can stabilize tests I think that would be 
perfectly acceptable to add in the unit tests util methods. I'll play with that 
a bit now.

> make ElasticsearchIO$BoundedElasticsearchSource#getEstimatedSizeBytes 
> deterministic
> -----------------------------------------------------------------------------------
>
>                 Key: BEAM-13137
>                 URL: https://issues.apache.org/jira/browse/BEAM-13137
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-elasticsearch
>            Reporter: Etienne Chauchot
>            Assignee: Evan Galpin
>            Priority: P2
>
> Index size estimation is statistical in ES and varies. But it is the base for 
> splitting so it needs to be more deterministic because that causes flakiness 
> in the UTests in _testSplit_ and _testSizes_ and maybe entails sub-optimal 
> splitting in production in some cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to