[ 
https://issues.apache.org/jira/browse/BEAM-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414574#comment-16414574
 ] 

Ismaël Mejía commented on BEAM-3820:
------------------------------------

One thing to remind here is that the runner process the elements that are part 
of a bundle. It can even arbitrarily (or via runner options) create bigger or 
smaller bundles, or sub divide the bundle, this means that the defined 
batchSize won't be necessarily respected. If batchSize is bigger than the size 
of the bundle then the size would be the size of the bundle decided by the 
runner. if it is less it should guarantee that it does not cross bundle 
boundaries to not violate fault tolerance guarantees (which means flush when 
bundle is finished).

So far multiple IOs (e.g. Elasticsearch, Jdbc, Spanner) among others allow to 
tune this via number of elements or size in bytes, so maybe it is not a bad 
idea to implement it here too..[~jkff] I understand that you don't necessarily 
like this, but users have requested this for other IOs too, so maybe it makes 
some sense to give this option if the IO can guarantee that fault tolerance is 
not broken.

I will free the ticket, you can take it if interested [~timrobertson100] 
(modulo the comments of Eugene).

 

> SolrIO: Allow changing batchSize for writes
> -------------------------------------------
>
>                 Key: BEAM-3820
>                 URL: https://issues.apache.org/jira/browse/BEAM-3820
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Tim Robertson
>            Assignee: Ismaël Mejía
>            Priority: Trivial
>
> The SolrIO hard codes the batchSize for writes at 1000.  It would be a good 
> addition to allow the user to set the batchSize explicitly (similar to the 
> ElasticsearchIO)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to