Adam and all I’m convinced this is worthy of a write up as an FSIP (Fineract SIGNIFICANT Improvement Project). We added this process knowing that it is sometimes important to hold a VOTE and do additional discussion.
I am +1 on the discussion so far as well. As to the approach, I agree w Tom’s comment. To resolve such ambiguity it should be done in a POC to provide additional assurance. Is it reasonable to start with that? I still haven’t heard anyone identify a downstream impact to smaller installations and third party apps, so assuming this has been looked at. It may be useful to write this up and provide a link in the usual places such that those parties who don’t monitor this may see the change coming. (Again, another use for the FSIP.). Thanks Sent from Gmail Mobile On Fri, Sep 5, 2025 at 11:31 AM Tom Daly <tdal...@gmail.com> wrote: > Hi Victor, Adam et al, > > There seems to be some general acknowledgment about the risks of handling > ID generation at the app level rather than the DB level again particularly > for clusters but I am also I suppose hearing that it is necessary. I know > little about the deployment cases and operational challenges for real > customers so I have to defer to you Victor and others who say this is > necessary. > > Still, to my second question then, which is related to "risk" and that is > how is any new ID generation scheme going to be tested under load and made > bullet proof ? What load generation tools exist for testing under load in > both clustered and non-clustered scenarios to make sure that this works > because (to say the least) it is not trivial. > > regards > Tom > > > > > On 5 Sep 2025, at 9:25 am, VICTOR MANUEL ROMERO RODRIGUEZ < > victor.rom...@fintecheando.mx> wrote: > > Hi Adam, > > I prefer to go for this approach. > > > - > > *Snowflake BIGINT* → no migration, performance-friendly, but requires > care around machine ID/clock handling. > > The machine Id and clock handling are items, that I consider, are already > managed in production environments. And if the system is already connected > to a payment system, the IT admins are already aware or are more familiar > about having clock synchronizarion while working with distributed systems > connected to Apache Fineract. > > Regards > > Victor > > > El El vie, 5 de sep de 2025 a la(s) 2:03 a.m., Ádám Sághy < > adamsa...@gmail.com> escribió: > >> Hi Paul, Tom, James, Victor, >> >> Thanks again for sharing your experience! >> >> I see why UUIDv7 can be attractive: time-ordered, globally unique, and >> avoids the pitfalls of machine IDs and clock sync. >> >> That said, one major consideration for us is migration effort. >> >> - >> >> *UUIDv7* would require a *full schema and data migration*, touching >> every table, index, and foreign key. For a system like Fineract this is a >> massive undertaking, realistically close to a year of work before things >> are stable, with high risk and ongoing maintenance cost for projects that >> are using Fineract for a while or using it with massive amount of data. >> - >> >> *Snowflake-style BIGINTs*, on the other hand, can be introduced >> without changing the existing PK data type. We can keep the schema as-is, >> avoid re-indexing terabytes of data, and stay compatible with all current >> integrations. >> >> Performance is the other angle: UUIDv7 doubles index size (128-bit vs >> 64-bit) and tends to fragment B-tree indexes more than sequential or >> roughly sequential IDs. Snowflake IDs remain compact BIGINTs, time-ordered >> enough for efficient indexing, and guarantee immediate availability in >> memory without requiring a flush. >> >> There’s another very important angle I’d like to add to this discussion: >> our current *PK generation strategy*. Today Fineract relies on IDENTITY >> columns, >> which means the database only provides an ID during flush/commit. The >> backend cannot obtain an ID before that point. Unfortunately, the platform >> depends heavily on having an ID *earlier* in the lifecycle for things >> like: >> >> - >> >> Creating and sending business events with the new entity ID >> - >> >> Populating result objects with the resource ID immediately after >> creation >> - >> >> Enabling true batch write operations (currently blocked) >> - >> >> Avoiding unnecessary flushes purely to retrieve generated IDs >> >> This limitation has significant impact on performance, testability, and >> developer experience. >> >> That’s why I’ve been leaning toward application-level ID generation >> (Snowflake-style BIGINTs). They solve the “flush dependency” problem while >> retaining our current schema and data type, so *no massive migration* is >> required. UUIDv7 would address uniqueness and ordering, but it would mean >> replacing all PK columns, indexes, and FKs — realistically a year-long, >> high-risk migration for a project of this size. >> >> Of course, Snowflake IDs aren’t perfect — after ~50 years we’d eventually >> hit their limits, but realistically no system of this kind goes half a >> century without a redesign anyway. >> >> So to me the trade-off looks like: >> >> - >> >> *UUIDv7* → simpler conceptually, but very heavy migration + larger >> index/storage footprint. >> - >> >> *Snowflake BIGINT* → no migration, performance-friendly, but requires >> care around machine ID/clock handling. >> >> >> *At the end the question is whether we want to fix 1 problem with a short >> implementation timeline, or fix 2 problems with a pretty lengthy >> implementation timeline and with the risk of a very complex data migration!* >> >> >> *As an ending note, I really love this conversation and all the ideas >> shared — that was exactly my intention in starting the thread. Please keep >> it going; we should consider all angles and hear all voices in order to >> select the best option for the community and the project!* >> >> Best, >> Adam >> >> On 2025. Sep 4., at 13:36, Paul <pchristi...@gmail.com> wrote: >> >> Hi Adam, >> I'm not super technical, but have faced similar issues in the past and >> will try to regurgitate what we discovered as a "better way" which might be >> useful. >> >> First, let me share using the Machine ID in the sequence ends up being an >> ugly pain in the ass point and was a near catastrophic issue for us . . >> . so while your solution would be an improvement, it creates new risks >> which are just as onerous. That was on metal servers. With cloud services, >> scaling and churn would seem to multiply the risk? While I'm sure there are >> ways around each challenge, again, each "way around" complicates the >> solution and worse, increases time, cost and cognitive load. >> >> I going to suggest (PLEASE double check my sanity) 128-bit IDs using >> time-ordered UUIDs v7. This provides global uniqueness with the only >> negative being a slight increase in index size (16 vs 8 bit). It should >> avoid several "what ifs" on clock issues (jumps and fall backs) and risk of >> using Machin ID, you will discover a LOT of extra effort is required to >> make those work. THEN after ALL that, will find that uniqueness is NOT >> certain. I found this out the hard way. In short, we tried what sounded >> like your method and failed to enjoy a truly scalable solution. In fact, we >> made 3 major efforts to get it to "play nice" and failed. >> >> This simplified method provides immediate ops benefit, is a low risk >> implementation. (Lead Developer, suggested it was zero risk, but I don't >> beleive in such a thing) We were working with Postgres, Java app, NOT >> Fineract, but assuming its generally applicable. >> >> Assumptions: >> 1) You're not trying to remove or replace primary keys and will keep them >> for internal use >> 2) Postgres handles indexing of uuid auto-magically; as long as the >> column is marked unique >> 3) Use uuid as the public ID in APIs >> >> If you think this is helpful or a better option, feel free to quiz me. >> >> Regards >> Paul >> >> >> >