airajena opened a new pull request, #17:
URL: https://github.com/apache/fineract-business-intelligence/pull/17

   ## Summary
   
   This PR introduces the complete **Apache Superset visualization layer** for 
the Apache Fineract Business Intelligence project. It connects directly to the 
analytics warehouse populated by the dbt transformation layer and exposes 
dashboard with row-level security (RLS), Jinja2 template processing, and a 
fully automated bootstrap system.
   
   This is the **third and final layer** of the BI stack  sitting downstream of 
the Python ELT extractor (PR 1) and the dbt transformation layer (PR 2).
   
   ---
   
   ## Architecture
   
   ```
   PostgreSQL Warehouse (analytics DB)
   │
   ├── analytics.*           ← dbt marts (queried by Superset virtual datasets)
   │   ├── mart_delinquency_par
   │   ├── mart_portfolio_health
   │   └── mart_repayment_behavior
   │
   ├── meta.*                ← RLS enforcement
   │   └── user_office_mapping   ← maps Superset users → allowed office_ids
   │
   └── Superset (Port 8088)
       ├── Datasets (6)       ← Virtual datasets with Jinja2 RLS SQL
       │   ├── delinquency_par_secure_all_dates
       │   ├── delinquency_par_secure_latest
       │   ├── portfolio_health_secure_all_dates
       │   ├── portfolio_health_secure_latest
       │   ├── repayment_behavior_secure_all_dates
       │   └── repayment_behavior_secure_latest
       │
       └── Dashboards (3)
           ├── Delinquency and PAR Dashboard   ← 11 charts
           ├── Portfolio Health Dashboard       ← 12 charts (future PR)
           └── Repayment Behavior Dashboard    ← 11 charts (future PR)
   ```
   
   
   
   ---
   
   ## Files Added
   
   ### Docker & Configuration
   
   | File | Description |
   |------|-------------|
   | `docker/superset/Dockerfile` | Extends `apache/superset:4.1.1`. Installs 
`psycopg2-binary==2.9.9` for PostgreSQL connectivity. |
   | `docker/superset/superset_config.py` | Superset runtime config. Enables 
`ENABLE_TEMPLATE_PROCESSING: True`  the critical flag that activates Jinja2 
template processing in virtual dataset SQL (required for RLS). |
   | `docker/superset/init_superset.sh` | Container entrypoint. Runs `superset 
db upgrade`, creates admin user, runs `superset init`, bootstraps branch 
manager users and dashboard assets, then starts the web server on `:8088`. |
   | `docker/superset/refresh_superset_assets.sh` | Utility script to re-run 
`bootstrap_superset_assets.py` on a live container  used when assets need 
updating without restarting the container. |
   | `docker/superset/bootstrap_superset_assets.py` | Core bootstrap script . 
Programmatically creates all 6 datasets, all chart configurations, and all 3 
dashboards via the Superset SQLAlchemy session. Idempotent  safe to re-run. |
   | `docker/superset/bootstrap_superset_security.py` | Creates `north_manager` 
and `south_manager` Superset users, assigns them the `Alpha` role, and inserts 
their office mappings into `meta.user_office_mapping`. |
   | `docker/dbt/Dockerfile` | dbt container. Extends `python:3.11-slim`, 
installs `dbt-postgres>=1.7,<2.0`. Used in the `dbt` service in `compose.yaml`. 
|
   
   ### Virtual Dataset SQL (`superset/datasets/`)
   
   | File | Dataset | Grain | Purpose |
   |------|---------|-------|---------|
   | `delinquency_par_secure_all_dates.sql` | 
`delinquency_par_secure_all_dates` | All dates | PAR trend charts (full 
history). RLS-filtered by `current_username()`. |
   | `delinquency_par_secure_latest.sql` | `delinquency_par_secure_latest` | 
Latest date only | KPI cards, pie charts, bar charts, summary table. |
   | `portfolio_health_secure_all_dates.sql` | 
`portfolio_health_secure_all_dates` | All dates | Portfolio trend charts. |
   | `portfolio_health_secure_latest.sql` | `portfolio_health_secure_latest` | 
Latest date only | Portfolio KPI cards and breakdown charts. |
   | `repayment_behavior_secure_all_dates.sql` | 
`repayment_behavior_secure_all_dates` | All dates | Collection efficiency and 
repayment trend charts. |
   | `repayment_behavior_secure_latest.sql` | 
`repayment_behavior_secure_latest` | Latest date only | Repayment KPI cards and 
summary table. |
   
   ### Dashboard Metadata (`superset/dashboards/`)
   
   | File | Description |
   |------|-------------|
   | `superset/dashboards/delinquency_par_dashboard.json` | Dashboard title, 
slug, and published state for the Delinquency and PAR dashboard. |
   | `superset/dashboards/portfolio_health_dashboard.json` | Dashboard title, 
slug, and published state for the Portfolio Health dashboard. |
   | `superset/dashboards/repayment_behavior_dashboard.json` | Dashboard title, 
slug, and published state for the Repayment Behavior dashboard. |
   
   ### Infrastructure Updates
   
   | File | Change |
   |------|--------|
   | `compose.yaml` | Adds `superset` service (port 8088), `dbt` service, and 
`superset_home` Docker volume. Both new services depend on `warehouse: 
healthy`. |
   | `.env.example` | Adds `SUPERSET_SECRET_KEY`, `SUPERSET_ADMIN_*`, 
`SUPERSET_NORTH_MANAGER_PASSWORD`, `SUPERSET_SOUTH_MANAGER_PASSWORD`, 
`SUPERSET_METADATA_URI`, `SUPERSET_WAREHOUSE_URI`. |
   | `.gitignore` | Adds `docker/superset/__pycache__/` exclusion. |
   | `.rat-excludes` | Adds `**/docker/superset/__pycache__/**` and 
`**/superset/dashboards/*.json` exclusions. Dashboard JSON files have no 
standard comment syntax for Apache license headers. |
   
   ---
   
   ## Row-Level Security (RLS)
   
   Every virtual dataset enforces row-level security at the SQL layer using 
Jinja2 template processing. No separate Superset RLS rules are needed  the 
filter lives inside the dataset SQL itself.
   
   
   ### User → Office mapping
   
   | Superset Username | Role | Offices Visible |
   |---|---|---|
   | `admin` | `ADMIN` | All branches |
   | `north_manager` | Branch Manager | North Branch only |
   | `south_manager` | Branch Manager | South Branch only |
   
   The `ENABLE_TEMPLATE_PROCESSING: True` flag in `superset_config.py` 
activates `current_username()` resolution. Without this flag, the Jinja2 
template is treated as a literal string and the RLS WHERE clause never executes.
   
   ---
   
   ## Bootstrap System
   
   The bootstrap is fully automated and idempotent. Every entity uses upsert 
semantics  running it twice does not create duplicates.
   
   ### Startup sequence
   
   ```
   docker compose up -d
           │
           ▼
   init_superset.sh
     ├── superset db upgrade          (apply Superset metadata migrations)
     ├── superset fab create-admin    (create admin user)
     ├── superset init                (seed default roles and permissions)
     ├── bootstrap_superset_security.py
     │     ├── Create north_manager user (Alpha role)
     │     ├── Create south_manager user (Alpha role)
     │     └── INSERT meta.user_office_mapping rows
     └── bootstrap_superset_assets.py
           ├── ensure_database()      (register warehouse DB connection)
           ├── ensure_dataset() × 6   (create/update virtual datasets + columns 
+ metrics)
           ├── ensure_chart() × 34    (create/update all chart configurations)
           └── ensure_dashboard() × 3 (create/update dashboards with layout)
   ```
   
   ### Key bootstrap functions
   
   **`ensure_dataset(database, name, sql, owner, dttm_col, columns, metrics)`**
   Creates or updates a virtual dataset. Rebuilds all `TableColumn` and 
`SqlMetric` rows on every run  guarantees column definitions stay in sync with 
the bootstrap code.
   
   **`ensure_chart(name, viz_type, datasource, params, owner)`**
   Creates or updates a chart. Stores the full form data as `params` JSON 
including `datasource`, `viz_type`, `number_format`, `metric`, `adhoc_filters`, 
and `subheader`. Clears `query_context` on update to prevent stale cache.
   
   **`ensure_dashboard(meta_filename, owner, charts, row_specs)`**
   Creates or updates a dashboard. Calls `build_layout()` to generate the 
Superset `position_json` grid layout from the `row_specs` list, then assigns 
all charts to the dashboard.
   
   **`mart_exists(table_name)`**
   Queries `information_schema.tables` to check if a mart table exists before 
bootstrapping that dashboard. Allows `docker compose up` to succeed even when 
only the delinquency mart is populated  Portfolio Health and Repayment Behavior 
dashboards are skipped gracefully with an INFO message.
   
   ---
   
   ## Delinquency and PAR Dashboard : Chart Reference
   
   This is the primary dashboard. It contains 11 charts across 5 rows, all 
sourcing from `mart_delinquency_par` via the two RLS-secured datasets.
   
   ### Row 1 :  KPI Cards
   
   The four summary cards show the portfolio health state as of the latest COB 
snapshot date.
   <img width="1891" height="278" alt="image" 
src="https://github.com/user-attachments/assets/4a209b08-c0c6-4dcf-a860-2aaba95dc7bf";
 />
   
   ---
   
   #### Chart 1 : PAR 30 KPI
   
   **Type**: Big Number  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1` (All Portfolio rollup row)
   
   **Formula**:
   ```
   PAR 30 = SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)
   ```
   
   `par_30_amount` = total outstanding principal of all loans where 
`min_age_days >= 30` (i.e., loans overdue by 30 or more days). This is 
cumulative it includes loans in PAR 60 and PAR 90 buckets. 
`total_portfolio_amount` = SUM of principal outstanding across all active loans.
   
   **Subheader**: "Portfolio at Risk > 30 days"  
   **Format**: `.2%` → displays as `58.14%`
   
   <img width="394" height="160" alt="par-30-kpi-2026-06-10T08-07-01 182Z" 
src="https://github.com/user-attachments/assets/7d43d8cf-2182-48e2-8226-517d8cdd0465";
 />
   
   
   ---
   
   #### Chart 2 : PAR 60 KPI
   
   **Type**: Big Number  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`
   
   **Formula**:
   ```
   PAR 60 = SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)
   ```
   
   `par_60_amount` = outstanding principal of loans overdue 60 or more days. 
Always ≤ PAR 30 (cumulative definition).
   
   **Subheader**: "Portfolio at Risk > 60 days"  
   **Format**: `.2%`
   
   <img width="394" height="160" alt="par-60-kpi-2026-06-10T08-07-07 034Z" 
src="https://github.com/user-attachments/assets/7455ac03-e048-467d-8e07-08f76d89cf90";
 />
   
   
   ---
   
   #### Chart 3 : PAR 90 KPI
   
   **Type**: Big Number  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`
   
   **Formula**:
   ```
   PAR 90 = SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)
   ```
   
   `par_90_amount` = outstanding principal of loans overdue 90 or more days. 
The most severe tier. Always ≤ PAR 60 ≤ PAR 30.
   
   **Subheader**: "Portfolio at Risk > 90 days"  
   **Format**: `.2%`
   
   <img width="394" height="160" alt="par-90-kpi-2026-06-10T08-07-11 215Z" 
src="https://github.com/user-attachments/assets/e7855091-c199-4911-84df-33c595ad982f";
 />
   
   
   ---
   
   #### Chart 4 : NPA Exposure KPI
   
   **Type**: Big Number  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`
   
   **Formula**:
   ```
   NPA Exposure = SUM(npa_amount)
   ```
   
   `npa_amount` = total outstanding principal of loans where `is_npa = true`. 
Unlike PAR, this is an **absolute currency amount**, not a ratio. `is_npa` is 
set by Fineract's COB when a loan exceeds the `overdue_days_for_npa` threshold 
on the loan product configuration.
   
   **Subheader**: "Non-Performing Asset Exposure"  
   **Format**: `$,.0f` → displays as `$36,252`
   
   <img width="394" height="160" alt="npa-exposure-kpi-2026-06-10T08-07-15 
299Z" 
src="https://github.com/user-attachments/assets/29d55452-5b55-4117-9b1b-6b0c09710e86";
 />
   
   
   
   ---
   
   ### Row 2 : Trend Charts
   
   Both trend charts use `delinquency_par_secure_all_dates` (full history 
dataset) with `bucket_key = -1` filter, plotting one data point per 
`snapshot_date`.
   <img width="1874" height="397" alt="image" 
src="https://github.com/user-attachments/assets/cdc65fbc-4d3b-4c83-823a-fab33b6b596f";
 />
   
   
   ---
   
   #### Chart 5 : PAR Trend Line
   
   **Type**: Line chart  
   **Dataset**: `delinquency_par_secure_all_dates`  
   **Filter**: `bucket_key = -1`  
   **X-axis**: `snapshot_date` (grain: `P1D` — daily)
   
   **Three metrics plotted**:
   
   | Line | Formula | Colour |
   |------|---------|--------|
   | PAR 30 | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
Blue |
   | PAR 60 | `SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
Red |
   | PAR 90 | `SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
Orange |
   
   Shows how the **ratio** of at-risk portfolio has evolved over time. A rising 
PAR 30 line with a lagging PAR 90 line indicates loans entering delinquency 
faster than they are reaching the severe tier the gap between lines is the 
"aging pipeline". When PAR 90 approaches PAR 30, most at-risk loans are 
severely delinquent.
   
   **Y-axis format**: `.1%`  
   **Row limit**: 5000
   
   <img width="804" height="336" alt="par-trend-line-2026-06-10T08-14-45 365Z" 
src="https://github.com/user-attachments/assets/eb4b3755-c426-4b75-a75c-7409f290470e";
 />
   
   
   ---
   
   #### Chart 6 : At-Risk Outstanding Trend
   
   **Type**: Stacked area chart  
   **Dataset**: `delinquency_par_secure_all_dates`  
   **Filter**: `bucket_key = -1`  
   **X-axis**: `snapshot_date`
   
   **Three metrics stacked**:
   
   | Band | Formula |
   |------|---------|
   | PAR 30 Amt | `SUM(par_30_amount)` |
   | PAR 60 Amt | `SUM(par_60_amount)` |
   | PAR 90 Amt | `SUM(par_90_amount)` |
   
   Shows the **absolute currency exposure** at risk over time — no division. 
Complements Chart 5: if the portfolio is growing fast and delinquency stays 
proportionally flat, Chart 5 (ratio) is flat while Chart 6 (absolute) grows. 
The area fill makes it easy to see the raw money at stake regardless of 
portfolio size changes.
   
   **Y-axis format**: `$,.0f`  
   **Stacked style**: `stack`
   
   <img width="804" height="336" 
alt="at-risk-outstanding-trend-2026-06-10T08-14-52 460Z" 
src="https://github.com/user-attachments/assets/f4e5a98f-2ed1-4fac-b246-255ab997b9a6";
 />
   
   
   ---
   
   ### Row 3 : Distribution Pie Charts
   
   Both pie charts use `delinquency_par_secure_latest` with `bucket_key != -1` 
to get the per-bucket rows (one row per delinquency band), not the portfolio 
rollup row.
   <img width="1867" height="359" alt="image" 
src="https://github.com/user-attachments/assets/cf3799f3-edb2-4b7a-82b7-efa12635ce47";
 />
   
   
   ---
   
   #### Chart 7 : Delinquency Bucket Distribution
   
   **Type**: Pie chart  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key != -1`
   
   **Formula**:
   ```
   Slice size = SUM(bucket_outstanding_amount)
   Grouped by: bucket_name
   ```
   
   `bucket_outstanding_amount` = SUM of `principal_outstanding` for loans in 
that specific bucket on the latest snapshot date. Shows what **share of 
outstanding principal** sits in each delinquency band. The "Current" slice 
(`bucket_key = 0`) represents loans with no active delinquency tag — fully 
on-schedule borrowers.
   
   **Format**: `$,.0f`
   
   | Bucket | bucket_key | Meaning |
   |--------|-----------|---------|
   | Current | 0 | No active delinquency tag — on-schedule |
   | Watch-list | 1 | 1–29 days overdue |
   | PAR 30-59 | 2 | 30–59 days overdue |
   | PAR 60-89 | 3 | 60–89 days overdue |
   | PAR 90+ | 4 | 90+ days overdue |
   
   <img width="804" height="288" 
alt="delinquency-bucket-distribution-2026-06-10T08-51-51 375Z" 
src="https://github.com/user-attachments/assets/206267fd-306c-4440-8801-3dac610c46fc";
 />
   
   
   ---
   
   #### Chart 8 :  At-Risk vs Current Loans
   
   **Type**: Pie chart  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key != -1`
   
   **Formula**:
   ```
   Slice size = SUM(bucket_loan_count)
   Grouped by: bucket_name
   ```
   
   Same breakdown as Chart 7 but by **loan count** not outstanding amount. 
Answers "how many accounts are at risk?" rather than "how much money is at 
risk?". The two pies tell different stories  a portfolio with two large 
delinquent loans shows a large amount slice but a small count slice. A 
portfolio with many small delinquent loans shows the opposite.
   
   **Format**: `,d`
   
   <img width="804" height="288" 
alt="at-risk-vs-current-loans-2026-06-10T08-52-00 152Z" 
src="https://github.com/user-attachments/assets/2f2bd8c5-297c-49d1-87c5-899b5b2af6cf";
 />
   
   
   ---
   
   ### Row 4 : Branch and Product Analysis
   
   Both bar charts use `delinquency_par_secure_latest` with `bucket_key = -1`, 
grouped by a dimension column. Shows how delinquency is distributed across 
business dimensions.
   <img width="1872" height="331" alt="image" 
src="https://github.com/user-attachments/assets/4278ad9d-3759-49f8-9cab-cac3cc9309e3";
 />
   
   
   ---
   
   #### Chart 9 : PAR by Branch
   
   **Type**: Bar chart  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`  
   **Group by**: `office_name`
   
   **Two metrics per branch**:
   
   | Bar | Formula |
   |-----|---------|
   | PAR 30 (blue) | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 
0)` |
   | PAR 90 (orange) | `SUM(par_90_amount) / 
NULLIF(SUM(total_portfolio_amount), 0)` |
   
   `total_portfolio_amount` is aggregated per office, so each branch's PAR 
ratio is self-contained. A branch with PAR 30 ≈ PAR 90 has most of its 
delinquent portfolio severely overdue (aged delinquency). A branch with large 
PAR 30 but small PAR 90 has fresh delinquency that has not yet aged.
   
   **Y-axis format**: `.1%`
   
   <img width="804" height="288" alt="par-by-branch-2026-06-10T08-52-23 029Z" 
src="https://github.com/user-attachments/assets/c8919bcf-7163-4b87-92d9-aa0462364508";
 />
   
   
   ---
   
   #### Chart 10 :  PAR by Product
   
   **Type**: Bar chart  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`  
   **Group by**: `product_name`
   
   **Two metrics per product** (same formulas as Chart 9).
   
   Identifies which loan products are generating the most delinquency. A 
product with consistently high PAR may have poor underwriting criteria, 
aggressive repayment terms, or a structural borrower cohort risk. Enables 
product-level portfolio management decisions.
   
   **Y-axis format**: `.1%`
   
   <img width="804" height="288" alt="par-by-product-2026-06-10T08-52-27 816Z" 
src="https://github.com/user-attachments/assets/37d06ff9-d6e0-4120-9290-65791ab707f3";
 />
   
   
   ---
   
   ### Row 5 :  Summary Table
   
   ---
   
   #### Chart 11 :  PAR Summary Table
   
   **Type**: Table  
   **Dataset**: `delinquency_par_secure_latest`  
   **Filter**: `bucket_key = -1`  
   **Group by**: `office_name`, `product_name`
   
   **Seven columns**:
   
   | Column | Formula | Description |
   |--------|---------|-------------|
   | Portfolio | `SUM(total_portfolio_amount)` | Total outstanding principal 
for this office × product |
   | PAR 30 | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
PAR 30 ratio |
   | PAR 60 | `SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
PAR 60 ratio |
   | PAR 90 | `SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
PAR 90 ratio |
   | NPA Loans | `SUM(npa_loan_count)` | Count of NPA loan accounts |
   | NPA Ratio | `SUM(npa_amount) / NULLIF(SUM(total_portfolio_amount), 0)` | 
NPA exposure ratio |
   
   The finest-grain diagnostic chart. Identifies the exact office × product 
combination driving delinquency — the starting point for branch manager 
investigation.
   
   <img width="1625" height="288" alt="par-summary-table-2026-06-10T08-54-16 
768Z" 
src="https://github.com/user-attachments/assets/3bec0047-c8f9-4580-a73e-60cd3d613e9d";
 />
   
   
   ---
   
   ## Delinquency and PAR Dashboard
   <img width="1657" height="1472" 
alt="delinquency-and-par-dashboard-2026-06-10T08-54-48 617Z" 
src="https://github.com/user-attachments/assets/3a4d8ddd-58bc-4ad4-afed-45376c4aade0";
 />
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to