[
https://issues.apache.org/jira/browse/METRON-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356155#comment-16356155
]
ASF GitHub Bot commented on METRON-1448:
----------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/929
You can set it to not manually commit (set `solr.commitPerBatch` to `false`
and no committing happens), but you're risking losing data if a worker dies.
Honestly, want a durable commit with a fsync before you ack the tuples in a
batch, otherwise you're courting data loss. This is the same strategy we do
for ES and HDFS (though commit there is a fsync). That being said, I'm
sensitive to performance issues around that that people may have, so I let
people turn it off with a strong warning in the docs (also this was legacy
behavior in the SolrWriter).
> Update SolrWriter to conform to new collection strategy
> -------------------------------------------------------
>
> Key: METRON-1448
> URL: https://issues.apache.org/jira/browse/METRON-1448
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Priority: Major
>
> Currently the SolrWriter presumes a single collection to be written to. The
> new collection strategy for Solr implies a collection per sensor. Also,
> there are a few rough edges in the writer which could stand smoothing:
> * By default, we use solr's implicit commit mechanism, rather than
> committing at the batch granularity. This may result in lost data on worker
> failure.
> * We do not use the the batch add api, but rather message-by-message add
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)