[ 
https://issues.apache.org/jira/browse/BEAM-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399419#comment-16399419
 ] 

Tim Robertson commented on BEAM-3820:
-------------------------------------

That is a laudable goal [~jkff] but the reality is Beam application developers 
will often be deploying on environments they don't fully control.  Without 
running diagnostics I don't see how Beam could self tune for every occassion.  

For my case I'm running batch (i.e. bounded sources) ETLs using Beam on Spark 
running in a YARN cluster on pre-production hardware (i.e. not top end by 
current standards nor carefully tuned).  Neither SolrIO nor ElasticSearchIO 
worked out the box, and resulted in a lot of task failures and significant 
retries before complete failure.  I was able to control some aspects of client 
throttling to get jobs to completion, but ended up tuning http timeouts, client 
retry behaviour (to stop full partition retries) and batch sizes in a patched 
SolrIO.

Given the IO connectors are all about making it easy to interface I'd urge 
sensible defaults be set on IO projects but let people turn the knobs they need 
to for various environments.

If it is a project decision not to support this closing as "won't fix" would be 
a good resolution so folk know to just write project specific adapters 
(BEAM-3849, BEAM-3848, BEAM-3026 would also fall under this category). 

> SolrIO: Allow changing batchSize for writes
> -------------------------------------------
>
>                 Key: BEAM-3820
>                 URL: https://issues.apache.org/jira/browse/BEAM-3820
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Tim Robertson
>            Assignee: Ismaël Mejía
>            Priority: Trivial
>
> The SolrIO hard codes the batchSize for writes at 1000.  It would be a good 
> addition to allow the user to set the batchSize explicitly (similar to the 
> ElasticsearchIO)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to