[
https://issues.apache.org/jira/browse/BEAM-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414574#comment-16414574
]
Ismaël Mejía commented on BEAM-3820:
------------------------------------
One thing to remind here is that the runner process the elements that are part
of a bundle. It can even arbitrarily (or via runner options) create bigger or
smaller bundles, or sub divide the bundle, this means that the defined
batchSize won't be necessarily respected. If batchSize is bigger than the size
of the bundle then the size would be the size of the bundle decided by the
runner. if it is less it should guarantee that it does not cross bundle
boundaries to not violate fault tolerance guarantees (which means flush when
bundle is finished).
So far multiple IOs (e.g. Elasticsearch, Jdbc, Spanner) among others allow to
tune this via number of elements or size in bytes, so maybe it is not a bad
idea to implement it here too..[~jkff] I understand that you don't necessarily
like this, but users have requested this for other IOs too, so maybe it makes
some sense to give this option if the IO can guarantee that fault tolerance is
not broken.
I will free the ticket, you can take it if interested [~timrobertson100]
(modulo the comments of Eugene).
> SolrIO: Allow changing batchSize for writes
> -------------------------------------------
>
> Key: BEAM-3820
> URL: https://issues.apache.org/jira/browse/BEAM-3820
> Project: Beam
> Issue Type: Improvement
> Components: io-java-solr
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Tim Robertson
> Assignee: Ismaël Mejía
> Priority: Trivial
>
> The SolrIO hard codes the batchSize for writes at 1000. It would be a good
> addition to allow the user to set the batchSize explicitly (similar to the
> ElasticsearchIO)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)