[
https://issues.apache.org/jira/browse/BEAM-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399419#comment-16399419
]
Tim Robertson commented on BEAM-3820:
-------------------------------------
That is a laudable goal [~jkff] but the reality is Beam application developers
will often be deploying on environments they don't fully control. Without
running diagnostics I don't see how Beam could self tune for every occassion.
For my case I'm running batch (i.e. bounded sources) ETLs using Beam on Spark
running in a YARN cluster on pre-production hardware (i.e. not top end by
current standards nor carefully tuned). Neither SolrIO nor ElasticSearchIO
worked out the box, and resulted in a lot of task failures and significant
retries before complete failure. I was able to control some aspects of client
throttling to get jobs to completion, but ended up tuning http timeouts, client
retry behaviour (to stop full partition retries) and batch sizes in a patched
SolrIO.
Given the IO connectors are all about making it easy to interface I'd urge
sensible defaults be set on IO projects but let people turn the knobs they need
to for various environments.
If it is a project decision not to support this closing as "won't fix" would be
a good resolution so folk know to just write project specific adapters
(BEAM-3849, BEAM-3848, BEAM-3026 would also fall under this category).
> SolrIO: Allow changing batchSize for writes
> -------------------------------------------
>
> Key: BEAM-3820
> URL: https://issues.apache.org/jira/browse/BEAM-3820
> Project: Beam
> Issue Type: Improvement
> Components: io-java-solr
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Tim Robertson
> Assignee: Ismaël Mejía
> Priority: Trivial
>
> The SolrIO hard codes the batchSize for writes at 1000. It would be a good
> addition to allow the user to set the batchSize explicitly (similar to the
> ElasticsearchIO)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)