Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Arnold Galovics Fri, 05 Sep 2025 03:57:03 -0700

Hey guys,

If you remember Adam, I talked to you about this ~2 years ago :))


As I did then and now, I'm still suggesting the usage of UUIDv7 even though
it has serious implications in terms of migration. In my opinion the
long-term gains it'd bring is much more valuable to the project than the
required timeline. We can easily split this work if someone spends some
time on figuring this out architecturally and execution-wise.

Just a quick thought how this could be done without manual migrations
(example is for Loan):
- expand loan table to have a new uuid column (not a pk yet)
- expand the loan repayment table (just one out of the other relationships)
to include a new uuid based fk column (not yet fk)
- generate uuids for the loans
- use the original loan ID to find corresponding repayment rows and use the
respective new uuids to fill the new fk column
- enforce foreign key on repayment uuid fk column, rename the original fk
and drop fk constraint (or maybe leave it as a legacy fk, both can make
sense)
- switch uuid column in loan to be a PK and rename the original one as
legacy_id or something

Something like this could work but this needs to be done step by step.

Best,
Arnold



Arnold Gálovics

*CEO, Co-Founder*

*+36 30 904 1885*

https://docktape.com

Docktape Technologies




On Fri, Sep 5, 2025 at 10:26 AM VICTOR MANUEL ROMERO RODRIGUEZ <
[email protected]> wrote:

> Hi Adam,
>
> I prefer to go for this approach.
>
>
>    -
>
>    *Snowflake BIGINT* → no migration, performance-friendly, but requires
>    care around machine ID/clock handling.
>
> The machine Id and clock handling are items, that I consider, are already
> managed in production environments. And if the system is already connected
> to a payment system, the IT admins are already aware or are more familiar
> about having clock synchronizarion while working with distributed systems
> connected to Apache Fineract.
>
> Regards
>
> Victor
>
>
> El El vie, 5 de sep de 2025 a la(s) 2:03 a.m., Ádám Sághy <
> [email protected]> escribió:
>
>> Hi Paul, Tom, James, Victor,
>>
>> Thanks again for sharing your experience!
>>
>> I see why UUIDv7 can be attractive: time-ordered, globally unique, and
>> avoids the pitfalls of machine IDs and clock sync.
>>
>> That said, one major consideration for us is migration effort.
>>
>>    -
>>
>>    *UUIDv7* would require a *full schema and data migration*, touching
>>    every table, index, and foreign key. For a system like Fineract this is a
>>    massive undertaking, realistically close to a year of work before things
>>    are stable, with high risk and ongoing maintenance cost for projects that
>>    are using Fineract for a while or using it with massive amount of data.
>>    -
>>
>>    *Snowflake-style BIGINTs*, on the other hand, can be introduced
>>    without changing the existing PK data type. We can keep the schema as-is,
>>    avoid re-indexing terabytes of data, and stay compatible with all current
>>    integrations.
>>
>> Performance is the other angle: UUIDv7 doubles index size (128-bit vs
>> 64-bit) and tends to fragment B-tree indexes more than sequential or
>> roughly sequential IDs. Snowflake IDs remain compact BIGINTs, time-ordered
>> enough for efficient indexing, and guarantee immediate availability in
>> memory without requiring a flush.
>>
>> There’s another very important angle I’d like to add to this discussion:
>> our current *PK generation strategy*. Today Fineract relies on IDENTITY 
>> columns,
>> which means the database only provides an ID during flush/commit. The
>> backend cannot obtain an ID before that point. Unfortunately, the platform
>> depends heavily on having an ID *earlier* in the lifecycle for things
>> like:
>>
>>    -
>>
>>    Creating and sending business events with the new entity ID
>>    -
>>
>>    Populating result objects with the resource ID immediately after
>>    creation
>>    -
>>
>>    Enabling true batch write operations (currently blocked)
>>    -
>>
>>    Avoiding unnecessary flushes purely to retrieve generated IDs
>>
>> This limitation has significant impact on performance, testability, and
>> developer experience.
>>
>> That’s why I’ve been leaning toward application-level ID generation
>> (Snowflake-style BIGINTs). They solve the “flush dependency” problem while
>> retaining our current schema and data type, so *no massive migration* is
>> required. UUIDv7 would address uniqueness and ordering, but it would mean
>> replacing all PK columns, indexes, and FKs — realistically a year-long,
>> high-risk migration for a project of this size.
>>
>> Of course, Snowflake IDs aren’t perfect — after ~50 years we’d eventually
>> hit their limits, but realistically no system of this kind goes half a
>> century without a redesign anyway.
>>
>> So to me the trade-off looks like:
>>
>>    -
>>
>>    *UUIDv7* → simpler conceptually, but very heavy migration + larger
>>    index/storage footprint.
>>    -
>>
>>    *Snowflake BIGINT* → no migration, performance-friendly, but requires
>>    care around machine ID/clock handling.
>>
>>
>> *At the end the question is whether we want to fix 1 problem with a short
>> implementation timeline, or fix 2 problems with a pretty lengthy
>> implementation timeline and with the risk of a very complex data migration!*
>>
>>
>> *As an ending note, I really love this conversation and all the ideas
>> shared — that was exactly my intention in starting the thread. Please keep
>> it going; we should consider all angles and hear all voices in order to
>> select the best option for the community and the project!*
>>
>> Best,
>> Adam
>>
>> On 2025. Sep 4., at 13:36, Paul <[email protected]> wrote:
>>
>> Hi Adam,
>> I'm not super technical, but have faced similar issues in the past and
>> will try to regurgitate what we discovered as a "better way" which might be
>> useful.
>>
>> First, let me share using the Machine ID in the sequence ends up being an
>> ugly pain in the ass point and was a near catastrophic issue for us . .
>> . so while your solution would be an improvement, it creates new risks
>> which are just as onerous. That was on metal servers. With cloud services,
>> scaling and churn would seem to multiply the risk? While I'm sure there are
>> ways around each challenge, again, each "way around" complicates the
>> solution and worse, increases time, cost and cognitive load.
>>
>> I going to suggest (PLEASE double check my sanity) 128-bit IDs using
>> time-ordered UUIDs v7. This provides global uniqueness with the only
>> negative being a slight increase in index size (16 vs 8 bit). It should
>> avoid several "what ifs" on clock issues (jumps and fall backs) and risk of
>> using Machin ID, you will discover a LOT of extra effort is required to
>> make those work. THEN after ALL that, will find that uniqueness is NOT
>> certain. I found this out the hard way. In short, we tried what sounded
>> like your method and failed to enjoy a truly scalable solution. In fact, we
>> made 3 major efforts to get it to "play nice" and failed.
>>
>> This simplified method provides immediate ops benefit, is a low risk
>> implementation. (Lead Developer, suggested it was zero risk, but I don't
>> beleive in such a thing)  We were working with Postgres, Java app, NOT
>> Fineract, but assuming its generally applicable.
>>
>> Assumptions:
>> 1) You're not trying to remove or replace primary keys and will keep them
>> for internal use
>> 2) Postgres handles indexing of uuid auto-magically; as long as the
>> column is marked unique
>> 3) Use uuid as the public ID in APIs
>>
>> If you think this is helpful or a better option, feel free to quiz me.
>>
>> Regards
>> Paul
>>
>>
>>

Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Reply via email to