Observability/SRE expert here :) well … it’s quite a lot.
At a minimum you need: 1. A minimal full-fledged Observability stack (can be pure OSS tools) which handles: - which telemetry you want to collect and from which workloads - who decides what is worth collecting - who decides what to drop and when - alert rules standards, runbooks standards - dashboards standards - SLIs, SLOs - what alerts to route to which teams ? - which alerts should go to only prod environment/ staging/ dev ? - data residency and compliance (where can your telemetry be stored compliance-wise) - securing your telemetry - which alerts need to be suppressed - Cost-management for the stack (THIS IS IMPORTANT !!) 2. Incident Management program - on-call schedules, post-mortems - notifications channels (slack/teams/pagerduty/phone/email ??) - incident escalation policies (Manager on Duty ? Engineer on. Duty ?) - which incidents to ignore on weekends ? - customer relations —> who contacts the client during an incident & at what point during the incident timeline? 3. Meta-monitoring - monitoring your Monitoring/Observability stack - implementing DR and resiliency PS: I scratched the surface here but this will probably answer you 80% of the way. On Mon 1. 6. 2026 at 12:40, Piotr Wargulak via dev <[email protected]> wrote: > Hi all, > > This is a slightly different request than usual for this list. > I want to understand the operations side of running OSS systems in > production, not the build side. Not necessarily Fineract specifically; any > OSS stack you're actually keeping alive day to day. > > A few things I'm trying to figure out: how do you know your system's OK > right now? Dashboards, alerts, someone checking every morning, or you find > out when users complain? > When something breaks, how do you find out, and how long until you know > what actually went wrong? Who handles it on the ground: an internal team, a > vendor, or that one person who built it? > > It would be most useful to hear from smaller operators (MFIs, credit > unions, smaller fintechs running Fineract or similar) where Datadog-tier > tooling isn't in the budget and there's no dedicated SRE team. > > For context, I'm at SolDevelo. We do OSS work, and we're thinking about > how to better support organizations running these kinds of systems in > production. I'd rather understand how the day-to-day actually looks first > than guess. > > Reply on the list if it's a short answer, or DM me / schedule a 15 minute > Google Meet if you want to talk it through. If a few people share, I'll > write up an anonymized summary and send it back here. > > Best regards, > PiotrW > SolDevelo > > > *SolDevelo* Sp. z o.o. [LLC] / www.soldevelo.com > Al. Zwycięstwa 96/98 > <https://www.google.com/maps/search/Al.+Zwyci%C4%99stwa+96%2F98?entry=gmail&source=g>, > 81-451, Gdynia, Poland > Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41 >
