Observability/SRE expert here :)

well … it’s quite a lot.

At a minimum you need:

1. A minimal full-fledged Observability stack (can be pure OSS tools) which
handles:

 - which telemetry you want to collect and from which workloads
- who decides what is worth collecting
- who decides what to drop and when
- alert rules standards, runbooks standards
- dashboards standards
- SLIs, SLOs
- what alerts to route to which teams ?
- which alerts should go to only prod environment/ staging/ dev ?
- data residency and compliance (where can your telemetry be stored
compliance-wise)
- securing your telemetry
-  which alerts need to be suppressed
- Cost-management for the stack (THIS IS IMPORTANT !!)

2. Incident Management program
- on-call schedules, post-mortems
- notifications channels (slack/teams/pagerduty/phone/email ??)
- incident escalation policies (Manager on Duty ? Engineer on. Duty ?)
- which incidents to ignore on weekends ?
- customer relations —> who contacts the client during an incident & at
what point during the incident timeline?

3. Meta-monitoring
- monitoring your Monitoring/Observability stack
- implementing DR and resiliency

PS: I scratched the surface here but this will probably answer you 80% of
the way.



On Mon 1. 6. 2026 at 12:40, Piotr Wargulak via dev <[email protected]>
wrote:

> Hi all,
>
> This is a slightly different request than usual for this list.
> I want to understand the operations side of running OSS systems in
> production, not the build side. Not necessarily Fineract specifically; any
> OSS stack you're actually keeping alive day to day.
>
> A few things I'm trying to figure out: how do you know your system's OK
> right now? Dashboards, alerts, someone checking every morning, or you find
> out when users complain?
> When something breaks, how do you find out, and how long until you know
> what actually went wrong? Who handles it on the ground: an internal team, a
> vendor, or that one person who built it?
>
> It would be most useful to hear from smaller operators (MFIs, credit
> unions, smaller fintechs running Fineract or similar) where Datadog-tier
> tooling isn't in the budget and there's no dedicated SRE team.
>
> For context, I'm at SolDevelo. We do OSS work, and we're thinking about
> how to better support organizations running these kinds of systems in
> production. I'd rather understand how the day-to-day actually looks first
> than guess.
>
> Reply on the list if it's a short answer, or DM me / schedule a 15 minute
> Google Meet if you want to talk it through. If a few people share, I'll
> write up an anonymized summary and send it back here.
>
> Best regards,
> PiotrW
> SolDevelo
>
>
> *SolDevelo* Sp. z o.o. [LLC] / www.soldevelo.com
> Al. Zwycięstwa 96/98
> <https://www.google.com/maps/search/Al.+Zwyci%C4%99stwa+96%2F98?entry=gmail&source=g>,
> 81-451, Gdynia, Poland
> Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41
>

Reply via email to