[
https://issues.apache.org/jira/browse/BEAM-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Robertson resolved BEAM-8306.
---------------------------------
Fix Version/s: 2.17.0
Resolution: Fixed
Thank you [~derek.he] and once again - apologies your original PR went so long
before review.
> improve estimation of data byte size reading from source in ElasticsearchIO
> ---------------------------------------------------------------------------
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
> Issue Type: Improvement
> Components: io-java-elasticsearch
> Affects Versions: 2.14.0
> Reporter: Derek He
> Assignee: Derek He
> Priority: Major
> Fix For: 2.17.0
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size.
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only
> contains a few documents in the index. ElasticsearchIO splits it into up
> to1024 BoundedSources in Google dataflow. It takes long time to finish the
> processing the small numbers of Elasticsearch document in Google dataflow.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)