[ 
https://issues.apache.org/jira/browse/BEAM-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated BEAM-3862:
--------------------------------
    Description: 
A good improvement for to SolrIO would be to allow the caller to provide a 
{{commitWithin}} parameter.  Currently the batch is passed to the underlying 
{{solrClient}} which results in defaulting to the configured server behavior.

The justification for exposing this is that the collection in the target SOLR 
server might be configured in a way that is not suitable for this beam job.  
E.g. a server tuned to accept real time updates with fast flush times from 
streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.

This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered 
together. I understand that the policy of Beam is not to expose parameters for 
tuning.  When it comes to the IOs which are for interfacing with external 
systems I recommend this policy be reconsidered.  The IO modules typically wrap 
clients to target systems ({{CloudSolrClient}} in this case) which all have 
tunable parameters for good reason. My recommendation would be to keep 
{{SolrIO.write()}} providing sensible defaults but expose an additional builder 
e.g. 
{{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}}
 .

Please feel free to assign to me if of interest and I'll provide a PR.

  was:
A good improvement for to SolrIO would be to allow the caller to provide a 
`commitWithin` parameter.  Currently the batch is passed to the underlying 
`solrClient` which results in defaulting to the configured server behavior.

The justification for exposing this is that the collection in the target SOLR 
server might be configured in a way that is not suitable for this beam job.  
E.g. a server tuned to accept real time updates with fast flush times from 
streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.

This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered 
together. I understand that the policy of Beam is not to expose parameters for 
tuning.  When it comes to the IOs which are for interfacing with external 
systems I recommend this policy be reconsidered.  The IO modules typically wrap 
clients to target systems (`CloudSolrClient` in this case) which all have 
tunable parameters for good reason. My recommendation would be to keep 
`SolrIO.write()` providing sensible defaults but expose an additional builder 
e.g.`SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()`.

Please feel free to assign to me if of interest and I'll provide a PR.


> SolrIO: Expose commitWithin to the Solr write
> ---------------------------------------------
>
>                 Key: BEAM-3862
>                 URL: https://issues.apache.org/jira/browse/BEAM-3862
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Tim Robertson
>            Assignee: Ismaël Mejía
>            Priority: Trivial
>
> A good improvement for to SolrIO would be to allow the caller to provide a 
> {{commitWithin}} parameter.  Currently the batch is passed to the underlying 
> {{solrClient}} which results in defaulting to the configured server behavior.
> The justification for exposing this is that the collection in the target SOLR 
> server might be configured in a way that is not suitable for this beam job.  
> E.g. a server tuned to accept real time updates with fast flush times from 
> streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
> This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered 
> together. I understand that the policy of Beam is not to expose parameters 
> for tuning.  When it comes to the IOs which are for interfacing with external 
> systems I recommend this policy be reconsidered.  The IO modules typically 
> wrap clients to target systems ({{CloudSolrClient}} in this case) which all 
> have tunable parameters for good reason. My recommendation would be to keep 
> {{SolrIO.write()}} providing sensible defaults but expose an additional 
> builder e.g. 
> {{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}}
>  .
> Please feel free to assign to me if of interest and I'll provide a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to