[ 
https://issues.apache.org/jira/browse/METRON-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892639#comment-15892639
 ] 

ASF GitHub Bot commented on METRON-694:
---------------------------------------

Github user james-sirota commented on the issue:

    https://github.com/apache/incubator-metron/pull/453
  
    Hi guys, this PR is built on one fundamental assumption: kafka is always 
available.  The source of truth for errors, therefore, is a kafka topic.  In a 
production setting errors should go into their own topic and the retention 
period (size) of that topic queue should be set very high so that you can 
retain as many errors as you can.  The reason we are are making this 
configurable is so that we can easier test this in Ansible by throwing both 
errors and valid telemetry into the same topic.  In production we would not do 
this and would have a dedicated topic and a dedicated topology to error writing 
with parallelism tuned way down to prioritize ingest of actual valid telemetry 
over errors.  The writing topology should attempt to write errors from the 
queue exactly to either ES, HDFS, or both exactly once.  If it cannot do that 
then it should ping whatever infrastructure monitoring component that you are 
using that your ES or HDFS is down.  That, however, is a different PR and is 
out of context here.  I will need to file this PR as follow-on work.  
    
    With that said, I personally see no problem with the way this PR is 
implemented.  It allows for a dedicated topic and writing of errors into ES or 
HDFS exactly once if running in production setting.  There is an option to 
configure the topic so you can have telemetry and errors in the same topic for 
testing on Ansible.  So +1 from me



> Index Errors from Topologies
> ----------------------------
>
>                 Key: METRON-694
>                 URL: https://issues.apache.org/jira/browse/METRON-694
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ryan Merriman
>
> Need to make sure (and review) that all the bolts write into the error queue. 
> Errors should then be consumed from the error queue and indexed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to