Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Jose Alberto Hernandez Fri, 05 Sep 2025 08:54:06 -0700

+1 This approach that Arnold has explained

On Fri, Sep 5, 2025 at 5:56 AM Arnold Galovics <arnold_galov...@docktape.com>
wrote:


> Hey guys,
>
> If you remember Adam, I talked to you about this ~2 years ago :))
>
> As I did then and now, I'm still suggesting the usage of UUIDv7 even
> though it has serious implications in terms of migration. In my opinion the
> long-term gains it'd bring is much more valuable to the project than the
> required timeline. We can easily split this work if someone spends some
> time on figuring this out architecturally and execution-wise.
>
> Just a quick thought how this could be done without manual migrations
> (example is for Loan):
> - expand loan table to have a new uuid column (not a pk yet)
> - expand the loan repayment table (just one out of the other
> relationships) to include a new uuid based fk column (not yet fk)
> - generate uuids for the loans
> - use the original loan ID to find corresponding repayment rows and use
> the respective new uuids to fill the new fk column
> - enforce foreign key on repayment uuid fk column, rename the original fk
> and drop fk constraint (or maybe leave it as a legacy fk, both can make
> sense)
> - switch uuid column in loan to be a PK and rename the original one as
> legacy_id or something
>
> Something like this could work but this needs to be done step by step.
>
> Best,
> Arnold
>
>
>
> Arnold Gálovics
>
> *CEO, Co-Founder*
>
> *+36 30 904 1885*
>
> https://docktape.com
>
> Docktape Technologies
>
>
>
>
> On Fri, Sep 5, 2025 at 10:26 AM VICTOR MANUEL ROMERO RODRIGUEZ <
> victor.rom...@fintecheando.mx> wrote:
>
>> Hi Adam,
>>
>> I prefer to go for this approach.
>>
>>
>>    -
>>
>>    *Snowflake BIGINT* → no migration, performance-friendly, but requires
>>    care around machine ID/clock handling.
>>
>> The machine Id and clock handling are items, that I consider, are already
>> managed in production environments. And if the system is already connected
>> to a payment system, the IT admins are already aware or are more familiar
>> about having clock synchronizarion while working with distributed systems
>> connected to Apache Fineract.
>>
>> Regards
>>
>> Victor
>>
>>
>> El El vie, 5 de sep de 2025 a la(s) 2:03 a.m., Ádám Sághy <
>> adamsa...@gmail.com> escribió:
>>
>>> Hi Paul, Tom, James, Victor,
>>>
>>> Thanks again for sharing your experience!
>>>
>>> I see why UUIDv7 can be attractive: time-ordered, globally unique, and
>>> avoids the pitfalls of machine IDs and clock sync.
>>>
>>> That said, one major consideration for us is migration effort.
>>>
>>>    -
>>>
>>>    *UUIDv7* would require a *full schema and data migration*, touching
>>>    every table, index, and foreign key. For a system like Fineract this is a
>>>    massive undertaking, realistically close to a year of work before things
>>>    are stable, with high risk and ongoing maintenance cost for projects that
>>>    are using Fineract for a while or using it with massive amount of data.
>>>    -
>>>
>>>    *Snowflake-style BIGINTs*, on the other hand, can be introduced
>>>    without changing the existing PK data type. We can keep the schema as-is,
>>>    avoid re-indexing terabytes of data, and stay compatible with all current
>>>    integrations.
>>>
>>> Performance is the other angle: UUIDv7 doubles index size (128-bit vs
>>> 64-bit) and tends to fragment B-tree indexes more than sequential or
>>> roughly sequential IDs. Snowflake IDs remain compact BIGINTs, time-ordered
>>> enough for efficient indexing, and guarantee immediate availability in
>>> memory without requiring a flush.
>>>
>>> There’s another very important angle I’d like to add to this discussion:
>>> our current *PK generation strategy*. Today Fineract relies on IDENTITY 
>>> columns,
>>> which means the database only provides an ID during flush/commit. The
>>> backend cannot obtain an ID before that point. Unfortunately, the platform
>>> depends heavily on having an ID *earlier* in the lifecycle for things
>>> like:
>>>
>>>    -
>>>
>>>    Creating and sending business events with the new entity ID
>>>    -
>>>
>>>    Populating result objects with the resource ID immediately after
>>>    creation
>>>    -
>>>
>>>    Enabling true batch write operations (currently blocked)
>>>    -
>>>
>>>    Avoiding unnecessary flushes purely to retrieve generated IDs
>>>
>>> This limitation has significant impact on performance, testability, and
>>> developer experience.
>>>
>>> That’s why I’ve been leaning toward application-level ID generation
>>> (Snowflake-style BIGINTs). They solve the “flush dependency” problem while
>>> retaining our current schema and data type, so *no massive migration* is
>>> required. UUIDv7 would address uniqueness and ordering, but it would mean
>>> replacing all PK columns, indexes, and FKs — realistically a year-long,
>>> high-risk migration for a project of this size.
>>>
>>> Of course, Snowflake IDs aren’t perfect — after ~50 years we’d
>>> eventually hit their limits, but realistically no system of this kind goes
>>> half a century without a redesign anyway.
>>>
>>> So to me the trade-off looks like:
>>>
>>>    -
>>>
>>>    *UUIDv7* → simpler conceptually, but very heavy migration + larger
>>>    index/storage footprint.
>>>    -
>>>
>>>    *Snowflake BIGINT* → no migration, performance-friendly, but
>>>    requires care around machine ID/clock handling.
>>>
>>>
>>> *At the end the question is whether we want to fix 1 problem with a
>>> short implementation timeline, or fix 2 problems with a pretty lengthy
>>> implementation timeline and with the risk of a very complex data migration!*
>>>
>>>
>>> *As an ending note, I really love this conversation and all the ideas
>>> shared — that was exactly my intention in starting the thread. Please keep
>>> it going; we should consider all angles and hear all voices in order to
>>> select the best option for the community and the project!*
>>>
>>> Best,
>>> Adam
>>>
>>> On 2025. Sep 4., at 13:36, Paul <pchristi...@gmail.com> wrote:
>>>
>>> Hi Adam,
>>> I'm not super technical, but have faced similar issues in the past and
>>> will try to regurgitate what we discovered as a "better way" which might be
>>> useful.
>>>
>>> First, let me share using the Machine ID in the sequence ends up being
>>> an ugly pain in the ass point and was a near catastrophic issue for us
>>> . . . so while your solution would be an improvement, it creates new risks
>>> which are just as onerous. That was on metal servers. With cloud services,
>>> scaling and churn would seem to multiply the risk? While I'm sure there are
>>> ways around each challenge, again, each "way around" complicates the
>>> solution and worse, increases time, cost and cognitive load.
>>>
>>> I going to suggest (PLEASE double check my sanity) 128-bit IDs using
>>> time-ordered UUIDs v7. This provides global uniqueness with the only
>>> negative being a slight increase in index size (16 vs 8 bit). It should
>>> avoid several "what ifs" on clock issues (jumps and fall backs) and risk of
>>> using Machin ID, you will discover a LOT of extra effort is required to
>>> make those work. THEN after ALL that, will find that uniqueness is NOT
>>> certain. I found this out the hard way. In short, we tried what sounded
>>> like your method and failed to enjoy a truly scalable solution. In fact, we
>>> made 3 major efforts to get it to "play nice" and failed.
>>>
>>> This simplified method provides immediate ops benefit, is a low risk
>>> implementation. (Lead Developer, suggested it was zero risk, but I don't
>>> beleive in such a thing)  We were working with Postgres, Java app, NOT
>>> Fineract, but assuming its generally applicable.
>>>
>>> Assumptions:
>>> 1) You're not trying to remove or replace primary keys and will keep
>>> them for internal use
>>> 2) Postgres handles indexing of uuid auto-magically; as long as the
>>> column is marked unique
>>> 3) Use uuid as the public ID in APIs
>>>
>>> If you think this is helpful or a better option, feel free to quiz me.
>>>
>>> Regards
>>> Paul
>>>
>>>
>>>

Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Reply via email to