airajena opened a new pull request, #14:
URL: https://github.com/apache/fineract-business-intelligence/pull/14

   # Summary
   
   This PR introduces the complete dbt transformation layer for the Apache 
Fineract Business Intelligence project. It transforms raw Fineract loan data 
loaded by the Python ELT extractor into analytics-ready tables that power the 
**Delinquency and Portfolio at Risk (PAR) Dashboard**.
   
   The transformation layer follows a 4-layer architecture: **raw → staging → 
intermediate → mart**, producing metrics segmented by branch and loan product.
   
   ---
   
   ## Architecture
   
   ```
   PostgreSQL Warehouse (analytics DB)
   │
   ├── raw.*              ← Python ELT extractor writes here (11 tables)
   │
   ├── staging.*          ← dbt views: rename columns, handle PII, filter noise
   │
   ├── intermediate.*     ← ephemeral: date spine × active loans × delinquency 
events
   │
   └── analytics.*        ← dbt tables: dimensions, facts, mart (queried by 
Superset)
       ├── dim_client
       ├── dim_office
       ├── dim_product
       ├── dim_date
       ├── dim_delinquency_bucket
       ├── dim_currency
       ├── fact_loan_snapshot      ← incremental, one row per loan per day
       ├── fact_delinquency_event  ← one row per delinquency event
       └── mart_delinquency_par    ← PAR 30/60/90 ratios, NPA, bucket 
distribution
   ```
   
   ---
   
   ## Files Added
   
   ### Configuration
   
   | File | Description |
   |------|-------------|
   | `dbt/dbt_project.yml` | Project config. Defines schema routing 
(`staging.*`, `analytics.*`), materialization defaults (views for staging, 
incremental for facts, tables for marts), and runtime vars 
(`historical_start_date`, `snapshot_incremental_lookback_days`). |
   | `dbt/profiles.yml` | PostgreSQL connection via environment variables 
(`PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE`). No credentials 
hardcoded. |
   | `dbt/requirements.txt` | `dbt-postgres>=1.7,<2.0`  Apache 2.0 licensed. |
   
   ### Macros
   
   | File | Description |
   |------|-------------|
   | `dbt/macros/generate_schema_name.sql` | Overrides dbt default that 
prefixes target schema onto custom schema names. Without this, `+schema: 
staging` would create `analytics_staging` instead of `staging`. |
   | `dbt/macros/safe_divide.sql` | NULL/zero-safe division macro. Used in all 
PAR ratio calculations to return NULL instead of a runtime division-by-zero 
error when portfolio is empty. |
   
   ### Source Declaration
   
   | File | Description |
   |------|-------------|
   | `dbt/models/sources.yml` | Declares all 11 raw tables as dbt sources with 
freshness rules (warn after 24h, error after 72h). Enables `{{ source('raw', 
'table_name') }}` references and `dbt source freshness` checks. |
   
   ### Staging Layer (views)
   
   | File | What it does |
   |------|-------------|
   | `dbt/models/staging/stg_m_loan.sql` | Renames `id → loan_id`. Passes 
through all balance columns including `principal_outstanding_derived`, 
`is_npa`. No filtering staging is neutral. |
   | `dbt/models/staging/stg_m_client.sql` | **PII handling**: derives 
`client_hash = md5(tenant_id \|\| '::' \|\| id)` using built-in `md5()` (no 
pgcrypto extension). Derives `age_band` from `date_of_birth`. **Excludes 
`date_of_birth`** from output. |
   | `dbt/models/staging/stg_m_loan_transaction.sql` | **Filters `WHERE NOT 
is_reversed`**  reversed transactions are cancelled payments that must not be 
double-counted. COALESCEs all `_portion_derived` columns to 0. |
   | `dbt/models/staging/stg_m_delinquency.sql` | Joins 4 raw tables into a 
single denormalised delinquency event view: 
`raw_m_loan_delinquency_tag_history` + `raw_m_delinquency_range` + 
`raw_m_delinquency_bucket_mappings` + `raw_m_delinquency_bucket`. |
   | `dbt/models/staging/stg_m_office.sql` | Renames `id → office_id`, `name → 
office_name`, `parent_id → parent_office_id`. |
   | `dbt/models/staging/stg_m_product_loan.sql` | Renames `id → product_id`, 
`name → product_name`. Passes through loan product configuration. |
   | `dbt/models/staging/stg_m_currency.sql` | Renames `code → currency_code`, 
`name → currency_name`. |
   | `dbt/models/staging/_staging.yml` | Column-level tests: `not_null`, 
`unique`, `accepted_values` for loan status codes, transaction type codes, age 
bands. |
   
   ### Intermediate Layer (ephemeral — no table created)
   
   | File | What it does |
   |------|-------------|
   | `dbt/models/intermediate/int_loan_delinquency_status.sql` | **Core PAR 
logic.** Creates one row per active loan per calendar day. Algorithm: (1) 
select all active loans (`loan_status_id = 300`) with `disbursedon_date IS NOT 
NULL`; (2) generate date spine from `historical_start_date` to `current_date`; 
(3) cross-join loan × date where date is within loan's active period; (4) LEFT 
JOIN delinquency events that were active on each date; (5) use `ROW_NUMBER()` 
to pick the most recent event per loan-day; (6) derive all PAR flags and 
`standard_par_band`. Materialized as `ephemeral` so it is inlined as a CTE 
inside `fact_loan_snapshot` at compile time. |
   
   ### Dimension Models (tables rebuilt on each run)
   
   | File | What it does |
   |------|-------------|
   | `dbt/models/marts/dimensions/dim_client.sql` | Pseudonymised client 
dimension. No PII. `client_hash` is the surrogate key. |
   | `dbt/models/marts/dimensions/dim_office.sql` | Branch hierarchy. Derives 
`office_level` from Fineract's dot-separated hierarchy string (e.g. `.1.2.` → 
level 2). |
   | `dbt/models/marts/dimensions/dim_product.sql` | Loan product configuration 
including interest rates, repayment frequency, NPA threshold. |
   | `dbt/models/marts/dimensions/dim_date.sql` | Full date spine 2010-01-01 to 
2035-12-31 with year, quarter, month, week, day-of-week, weekend flag, 
first/last day of month, first day of quarter. |
   | `dbt/models/marts/dimensions/dim_delinquency_bucket.sql` | Maps configured 
Fineract delinquency buckets to standard PAR bands. Always includes synthetic 
`bucket_key = 0` for `Current` (non-delinquent) loans. |
   | `dbt/models/marts/dimensions/dim_currency.sql` | ISO 4217 currency 
reference. |
   
   ### Fact Models (incremental tables)
   
   | File | What it does |
   |------|-------------|
   | `dbt/models/marts/facts/fact_loan_snapshot.sql` | **Primary fact table.** 
One row per active loan per day. Incremental with `delete+insert`, 120-day 
lookback. `snapshot_key = md5(tenant_id::loan_id::snapshot_date)`. Contains PAR 
flags (`is_par_30`, `is_par_60`, `is_par_90`, `is_watch_list`, `is_npa`) and 
`standard_par_band`. |
   | `dbt/models/marts/facts/fact_delinquency_event.sql` | One row per 
delinquency classification event on a loan. Captures `previous_bucket_key` and 
`previous_standard_par_band` via `LAG()` window function for the **delinquency 
migration matrix**. Tracks `event_duration_days` and `is_active_event`. |
   
   ### Mart (presentation table)
   
   | File | What it does |
   |------|-------------|
   | `dbt/models/marts/presentations/mart_delinquency_par.sql` | **Dashboard 
mart.** Aggregates `fact_loan_snapshot` by `snapshot_date`, `office_id`, 
`product_id`. Produces two row types per group: (1) `bucket_key = -1` ("All 
Portfolio") rollup row for KPI cards; (2) per-bucket rows for distribution 
charts. Contains all PAR metrics, NPA metrics, exclusive and cumulative band 
amounts, and ratios. |
   | `dbt/models/marts/_marts.yml` | FK relationship tests, uniqueness tests, 
PAR ratio bounds test. |
   
   
   
   ---
   
   
   
   ## Screenshots 
   
   ### 1. All schemas and tables created
   
   The complete warehouse schema is initialised  raw landing zone (11 tables), 
analytics layer (9 tables), and metadata layer (3 tables). Staging is correctly 
materialised as views (not tables).
   
   
   <img width="764" height="562" alt="image" 
src="https://github.com/user-attachments/assets/f2cf2d53-3351-4a58-b76c-21514fb9028a";
 />
   
   
   
   ---
   
   ### 2. ELT pipeline ran successfully
   The Python extractor completed a full backfill, loading 1,268 rows across 
all 11 source tables with `status = success`. The `meta.pipeline_state` table 
tracks every run with timestamps and row counts.
   
   <img width="1653" height="181" alt="image" 
src="https://github.com/user-attachments/assets/6e6d34b4-f4e6-4f4d-9dad-69f8e6c1eaee";
 />
   
   
   ---
   
   ### 3.  PAR ratios by office and product
   
   `mart_delinquency_par` correctly aggregates PAR 30/60/90 ratios segmented by 
branch and loan product . South Branch shows 100% PAR 30 (all 3 loans 
delinquent) and 80.4% NPA.
   
   <img width="1584" height="179" alt="image" 
src="https://github.com/user-attachments/assets/7395eb09-5798-421e-afc0-005e048db988";
 />
   
   
   ---
   
   ### 4. All 6 delinquency scenarios classified correctly
   
    `fact_loan_snapshot` correctly classifies all 13 active loans into their 
exact PAR band on today's date. Every scenario is present: Current, Watch-list, 
PAR 30-59, PAR 60-89, PAR 90+ (normal), PAR 90+ (NPA), PAR 90+ (extreme/180+). 
Loans 13 (written-off) and 14 (closed) are correctly excluded.
   
   <img width="1639" height="358" alt="image" 
src="https://github.com/user-attachments/assets/2b96b057-5eb0-4a3f-ae9d-9a8c26f5df30";
 />
   
   
   ---
   
   ### 5. Delinquency bucket distribution
   
   `mart_delinquency_par` correctly produces per-bucket rows showing how the 
portfolio is distributed across delinquency bands. Uses the 
`dim_delinquency_bucket` dimension joined into the mart. Used to power the 
bucket distribution chart in Superset.
   
   <img width="833" height="123" alt="image" 
src="https://github.com/user-attachments/assets/7f970a40-c8b0-4ec3-a7c2-bc56c2e84597";
 />
   
   
   ---
   
   ### 6. Delinquency migration matrix (PAR 30 → PAR 60 → PAR 90)
   
    `fact_delinquency_event` captures the complete delinquency history of a 
loan with `previous_standard_par_band` derived via `LAG()`. Loan 10 shows the 
full journey: entered PAR 30 on 2025-11-29 (61 days), escalated to PAR 60 on 
2026-01-29 (30 days), escalated to PAR 90+ on 2026-02-28 (still active, 94+ 
days). 
   
   <img width="1270" height="140" alt="image" 
src="https://github.com/user-attachments/assets/db30383c-008e-42f1-aab0-08ff8ef82b7e";
 />
   
   
   
   ---
   
   ### 7. Recovered loan : delinquency lifted, now Current
   
   The date-spine logic correctly handles loan recovery. Loan 15 was in PAR 30 
from 2026-01-15 to 2026-03-01 (`is_active_event = false`). On `CURRENT_DATE` it 
correctly shows `standard_par_band = 'Current'` and `is_par_30 = false`  the 
loan is no longer delinquent.
   
   <img width="1048" height="358" alt="image" 
src="https://github.com/user-attachments/assets/eacc82ab-cfc7-4a87-81d7-a621f3328d04";
 />
   
   
   
   ---
   
   ### 8. Reversed transaction filter in staging
   
    `stg_m_loan_transaction` correctly applies `WHERE NOT is_reversed`. The raw 
table has 76 transactions (including 1 reversed payment). Staging exposes only 
75 — the reversed transaction is excluded from all downstream calculations.
   
   <img width="400" height="114" alt="image" 
src="https://github.com/user-attachments/assets/f4e4b1c2-75aa-418f-b978-2c1727c6048b";
 />
   
   
   ---
   
   ### 9. PAR trend over time date spine is working
   
    The date spine in `int_loan_delinquency_status` correctly generates one 
snapshot per loan per day. The mart aggregation produces a daily row going back 
610 days to the earliest loan disbursement . 
   
   <img width="930" height="388" alt="image" 
src="https://github.com/user-attachments/assets/0d4abcf9-cfb5-4431-992b-f9bc28bffed9";
 />
   
   
   
   ---
   
   ### 10. PII handling : date_of_birth removed, client pseudonymised
   
   The staging layer correctly removes `date_of_birth` from the analytics layer 
and replaces it with `age_band` (bucketed). The `client_id` is replaced with 
`client_hash = md5(tenant_id || '::' || id)`. Raw data contains PII; analytics 
contains none.
   
   <img width="1077" height="327" alt="image" 
src="https://github.com/user-attachments/assets/5ec235da-4d93-4c20-81fe-8942e1bd94f4";
 />
   
   
   ---
   
   ### 11. dbt tests : all 57 pass
   
   All column-level data quality tests pass: primary key uniqueness, not-null 
constraints, accepted values for enum columns, and FK integrity between fact 
and dimension tables.
   
   <img width="1404" height="867" alt="image" 
src="https://github.com/user-attachments/assets/308faf3c-3960-42fb-950a-657624345506";
 />
   
   
   ---
   
   ### 12. dbt run  all 16 models built
   
   The full transformation DAG builds successfully in the correct dependency 
order. 7 staging views → 6 dimension tables → 2 fact tables → 1 mart.
   
   <img width="1460" height="772" alt="image" 
src="https://github.com/user-attachments/assets/78fcec7e-d7f5-4876-9877-9405ee9fefab";
 />
   
   
   ---
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to