Re: How do you actually run things in production?

Piotr Wargulak via dev Mon, 08 Jun 2026 05:44:48 -0700

Thanks, two follow-ups if you have a second:

   - What draws you to that page in the morning - a habit, an alert,
   something else? And is this for one system or several?
   - And once you see something firing, what's the next step: jump straight
   to the system, check the underlying metric, or ping someone?



PiotrW

On Mon, Jun 8, 2026 at 11:09 AM Edmore Tshuma <[email protected]> wrote:

> I check the Grafana Alert Rules page first. It shows the state of all
> configured alerts - which ones are firing/not firing.
>
> NB: It’s not a dashboard, just some menu item in Grafana
>
> On Mon 8. 6. 2026 at 10:05, Piotr Wargulak <[email protected]>
> wrote:
>
>> Hello all,
>>
>> @Edmore Tshuma: Thanks for the detailed answer last week, that gave me a
>> good list of what a mature observability setup should cover.
>>
>> I'm still hoping to hear from people who actually run Fineract (or
>> similar OSS) in production. Not how it should be done in theory, but how
>> you actually do it day to day.
>>
>> If a full conversation feels like too much, here's a smaller question:
>> what's the first thing you check in the morning to see if the system is OK?
>> A dashboard, a log, a Slack channel, or do you "wait for someone to
>> complain"?
>>
>> I'm especially interested in hearing from operators who lack a budget for
>> a dedicated SRE team.
>>
>> Best regards,
>> PiotrW
>> SolDevelo
>>
>> On Mon, Jun 1, 2026 at 1:08 PM Edmore Tshuma <[email protected]> wrote:
>>
>>> Observability/SRE expert here :)
>>>
>>> well … it’s quite a lot.
>>>
>>> At a minimum you need:
>>>
>>> 1. A minimal full-fledged Observability stack (can be pure OSS tools)
>>> which handles:
>>>
>>>  - which telemetry you want to collect and from which workloads
>>> - who decides what is worth collecting
>>> - who decides what to drop and when
>>> - alert rules standards, runbooks standards
>>> - dashboards standards
>>> - SLIs, SLOs
>>> - what alerts to route to which teams ?
>>> - which alerts should go to only prod environment/ staging/ dev ?
>>> - data residency and compliance (where can your telemetry be stored
>>> compliance-wise)
>>> - securing your telemetry
>>> -  which alerts need to be suppressed
>>> - Cost-management for the stack (THIS IS IMPORTANT !!)
>>>
>>> 2. Incident Management program
>>> - on-call schedules, post-mortems
>>> - notifications channels (slack/teams/pagerduty/phone/email ??)
>>> - incident escalation policies (Manager on Duty ? Engineer on. Duty ?)
>>> - which incidents to ignore on weekends ?
>>> - customer relations —> who contacts the client during an incident & at
>>> what point during the incident timeline?
>>>
>>> 3. Meta-monitoring
>>> - monitoring your Monitoring/Observability stack
>>> - implementing DR and resiliency
>>>
>>> PS: I scratched the surface here but this will probably answer you 80%
>>> of the way.
>>>
>>>
>>>
>>> On Mon 1. 6. 2026 at 12:40, Piotr Wargulak via dev <
>>> [email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This is a slightly different request than usual for this list.
>>>> I want to understand the operations side of running OSS systems in
>>>> production, not the build side. Not necessarily Fineract specifically; any
>>>> OSS stack you're actually keeping alive day to day.
>>>>
>>>> A few things I'm trying to figure out: how do you know your system's OK
>>>> right now? Dashboards, alerts, someone checking every morning, or you find
>>>> out when users complain?
>>>> When something breaks, how do you find out, and how long until you know
>>>> what actually went wrong? Who handles it on the ground: an internal team, a
>>>> vendor, or that one person who built it?
>>>>
>>>> It would be most useful to hear from smaller operators (MFIs, credit
>>>> unions, smaller fintechs running Fineract or similar) where Datadog-tier
>>>> tooling isn't in the budget and there's no dedicated SRE team.
>>>>
>>>> For context, I'm at SolDevelo. We do OSS work, and we're thinking about
>>>> how to better support organizations running these kinds of systems in
>>>> production. I'd rather understand how the day-to-day actually looks first
>>>> than guess.
>>>>
>>>> Reply on the list if it's a short answer, or DM me / schedule a 15
>>>> minute Google Meet if you want to talk it through. If a few people share,
>>>> I'll write up an anonymized summary and send it back here.
>>>>
>>>> Best regards,
>>>> PiotrW
>>>> SolDevelo
>>>>
>>>>
>>>> *SolDevelo* Sp. z o.o. [LLC] / www.soldevelo.com
>>>> Al. Zwycięstwa 96/98
>>>> <https://www.google.com/maps/search/Al.+Zwyci%C4%99stwa+96%2F98?entry=gmail&source=g>,
>>>> 81-451, Gdynia, Poland
>>>> Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41
>>>>
>>>
>>
>> *SolDevelo* Sp. z o.o. [LLC] / www.soldevelo.com
>> Al. Zwycięstwa 96/98
>> <https://www.google.com/maps/search/Al.+Zwyci%C4%99stwa+96%2F98?entry=gmail&source=g>,
>> 81-451, Gdynia, Poland
>> Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41
>>
>

-- 
*
SolDevelo* Sp. z o.o. [LLC] / www.soldevelo.com 
<http://www.soldevelo.com>
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

Re: How do you actually run things in production?

Reply via email to