[
https://issues.apache.org/jira/browse/BEAM-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Robertson updated BEAM-3862:
--------------------------------
Description:
A good improvement for to SolrIO would be to allow the caller to provide a
{{commitWithin}} parameter. Currently the batch is passed to the underlying
{{solrClient}} which results in defaulting to the configured server behavior.
The justification for exposing this is that the collection in the target SOLR
server might be configured in a way that is not suitable for this beam job.
E.g. a server tuned to accept real time updates with fast flush times from
streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered
together. I understand that the policy of Beam is not to expose parameters for
tuning. When it comes to the IOs which are for interfacing with external
systems I recommend this policy be reconsidered. The IO modules typically wrap
clients to target systems ({{CloudSolrClient}} in this case) which all have
tunable parameters for good reason. My recommendation would be to keep
{{SolrIO.write()}} providing sensible defaults but expose an additional builder
e.g.
{{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}}
.
Please feel free to assign to me if of interest and I'll provide a PR.
was:
A good improvement for to SolrIO would be to allow the caller to provide a
`commitWithin` parameter. Currently the batch is passed to the underlying
`solrClient` which results in defaulting to the configured server behavior.
The justification for exposing this is that the collection in the target SOLR
server might be configured in a way that is not suitable for this beam job.
E.g. a server tuned to accept real time updates with fast flush times from
streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered
together. I understand that the policy of Beam is not to expose parameters for
tuning. When it comes to the IOs which are for interfacing with external
systems I recommend this policy be reconsidered. The IO modules typically wrap
clients to target systems (`CloudSolrClient` in this case) which all have
tunable parameters for good reason. My recommendation would be to keep
`SolrIO.write()` providing sensible defaults but expose an additional builder
e.g.`SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()`.
Please feel free to assign to me if of interest and I'll provide a PR.
> SolrIO: Expose commitWithin to the Solr write
> ---------------------------------------------
>
> Key: BEAM-3862
> URL: https://issues.apache.org/jira/browse/BEAM-3862
> Project: Beam
> Issue Type: Improvement
> Components: io-java-solr
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Tim Robertson
> Assignee: Ismaël Mejía
> Priority: Trivial
>
> A good improvement for to SolrIO would be to allow the caller to provide a
> {{commitWithin}} parameter. Currently the batch is passed to the underlying
> {{solrClient}} which results in defaulting to the configured server behavior.
> The justification for exposing this is that the collection in the target SOLR
> server might be configured in a way that is not suitable for this beam job.
> E.g. a server tuned to accept real time updates with fast flush times from
> streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
> This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered
> together. I understand that the policy of Beam is not to expose parameters
> for tuning. When it comes to the IOs which are for interfacing with external
> systems I recommend this policy be reconsidered. The IO modules typically
> wrap clients to target systems ({{CloudSolrClient}} in this case) which all
> have tunable parameters for good reason. My recommendation would be to keep
> {{SolrIO.write()}} providing sensible defaults but expose an additional
> builder e.g.
> {{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}}
> .
> Please feel free to assign to me if of interest and I'll provide a PR.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)