Tim Robertson created BEAM-3862:
-----------------------------------

             Summary: SolrIO: Expose commitWithin to the Solr write
                 Key: BEAM-3862
                 URL: https://issues.apache.org/jira/browse/BEAM-3862
             Project: Beam
          Issue Type: Improvement
          Components: io-java-solr
    Affects Versions: 2.3.0, 2.2.0
            Reporter: Tim Robertson
            Assignee: Ismaël Mejía


A good improvement for to SolrIO would be to allow the caller to provide a 
`commitWithin` parameter.  Currently the batch is passed to the underlying 
`solrClient` which results in defaulting to the configured server behavior.

The justification for exposing this is that the collection in the target SOLR 
server might be configured in a way that is not suitable for this beam job.  
E.g. a server tuned to accept real time updates with fast flush times from 
streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.

This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered 
together. I understand that the policy of Beam is not to expose parameters for 
tuning.  When it comes to the IOs which are for interfacing with external 
systems I recommend this policy be reconsidered.  The IO modules typically wrap 
clients to target systems (`CloudSolrClient` in this case) which all have 
tunable parameters for good reason. My recommendation would be to keep 
`SolrIO.write()` providing sensible defaults but expose an additional builder 
e.g.`SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()`.

Please feel free to assign to me if of interest and I'll provide a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to