[
https://issues.apache.org/jira/browse/MINIFI-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108832#comment-16108832
]
Joseph Niemiec commented on MINIFI-356:
---------------------------------------
1) While I have been using it I do think we are so early on that my experiences
are not going to be reminiscent. I used alot of shell scripts, in time one
would hope we use more native processors. I agree we need to remain skeptical
of how it will be used until we have more light from many.
2) That makes sense. I guess my time in Hadoop leads my IO Wait question to be
more of a 'partial failure' that leads to a worse state then a total failure.
But for now it makes sense to limit the scope to a simple containable idea so I
like the FS-API coupling to trigger volatile storage.
3) I see you opened MINIFI-360 so we can talk about the OOMness there. I guess
the question now is it a % of available memory that each connection gets in
this volatile mode. Or more to your point do we need to decide now or can we
wait for more c2 metrics to roll in on memory use?
My concern with watching minifi devices is that we are no doubt going to
venture into many devices of many differing use cases and will what we collect
be of any use? The rabbit hole starts to run deep when we consider 1 sensor on
a device may have more significance than another; this in my mind leads to
having to assign priority or more explicit connection %'s for the volatile
storage to ensure that this high value sensor is getting saved and we ignore
the other sensors. Now if we wanted to just call it a limited degraded mode of
operation then perhaps we dont need to run down this hole.
Its probably time to email thread this one :D you wanna start it as you opened
the Jira?
###
Not to change direction but perhaps we should classify the failure modes as i
could see wanting to do this for more than just failed storage. Just a quick
brainstorm on possible ones -
* Complete Failure - No Notice
* Partial Unknown Failure - Notice but Stop Processing
* Partial Disk Failure - Notice and Continue with Limited Volatile Storage
* Partial Network Failure - What is this? Failure to send via the API a number
of times. We could have a failure policy to try to batch more per send, or
compress aggressively before sending? Or send using REED SOLOMON Encoding...
* Partial CPU Failure - Could reduce concurrency of processors above 1 ?
* Partial Ram Failure - Could resize connections, use swap?
> Create repository failure policy
> --------------------------------
>
> Key: MINIFI-356
> URL: https://issues.apache.org/jira/browse/MINIFI-356
> Project: Apache NiFi MiNiFi
> Issue Type: Improvement
> Components: C++
> Reporter: marco polo
> Assignee: marco polo
> Labels: Durability
> Fix For: cpp-0.3.0
>
>
> Create a failure policy for continuing operations if a repo failure occurs.
> I.e. If writing to disk fails above a threshold ( 100 % for example ), we can
> move to a volatile repo where we can continue operations and report that we
> have a failure.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)