airajena opened a new pull request, #17:
URL: https://github.com/apache/fineract-business-intelligence/pull/17
## Summary
This PR introduces the complete **Apache Superset visualization layer** for
the Apache Fineract Business Intelligence project. It connects directly to the
analytics warehouse populated by the dbt transformation layer and exposes
dashboard with row-level security (RLS), Jinja2 template processing, and a
fully automated bootstrap system.
This is the **third and final layer** of the BI stack sitting downstream of
the Python ELT extractor (PR 1) and the dbt transformation layer (PR 2).
---
## Architecture
```
PostgreSQL Warehouse (analytics DB)
│
├── analytics.* ← dbt marts (queried by Superset virtual datasets)
│ ├── mart_delinquency_par
│ ├── mart_portfolio_health
│ └── mart_repayment_behavior
│
├── meta.* ← RLS enforcement
│ └── user_office_mapping ← maps Superset users → allowed office_ids
│
└── Superset (Port 8088)
├── Datasets (6) ← Virtual datasets with Jinja2 RLS SQL
│ ├── delinquency_par_secure_all_dates
│ ├── delinquency_par_secure_latest
│ ├── portfolio_health_secure_all_dates
│ ├── portfolio_health_secure_latest
│ ├── repayment_behavior_secure_all_dates
│ └── repayment_behavior_secure_latest
│
└── Dashboards (3)
├── Delinquency and PAR Dashboard ← 11 charts
├── Portfolio Health Dashboard ← 12 charts (future PR)
└── Repayment Behavior Dashboard ← 11 charts (future PR)
```
---
## Files Added
### Docker & Configuration
| File | Description |
|------|-------------|
| `docker/superset/Dockerfile` | Extends `apache/superset:4.1.1`. Installs
`psycopg2-binary==2.9.9` for PostgreSQL connectivity. |
| `docker/superset/superset_config.py` | Superset runtime config. Enables
`ENABLE_TEMPLATE_PROCESSING: True` the critical flag that activates Jinja2
template processing in virtual dataset SQL (required for RLS). |
| `docker/superset/init_superset.sh` | Container entrypoint. Runs `superset
db upgrade`, creates admin user, runs `superset init`, bootstraps branch
manager users and dashboard assets, then starts the web server on `:8088`. |
| `docker/superset/refresh_superset_assets.sh` | Utility script to re-run
`bootstrap_superset_assets.py` on a live container used when assets need
updating without restarting the container. |
| `docker/superset/bootstrap_superset_assets.py` | Core bootstrap script .
Programmatically creates all 6 datasets, all chart configurations, and all 3
dashboards via the Superset SQLAlchemy session. Idempotent safe to re-run. |
| `docker/superset/bootstrap_superset_security.py` | Creates `north_manager`
and `south_manager` Superset users, assigns them the `Alpha` role, and inserts
their office mappings into `meta.user_office_mapping`. |
| `docker/dbt/Dockerfile` | dbt container. Extends `python:3.11-slim`,
installs `dbt-postgres>=1.7,<2.0`. Used in the `dbt` service in `compose.yaml`.
|
### Virtual Dataset SQL (`superset/datasets/`)
| File | Dataset | Grain | Purpose |
|------|---------|-------|---------|
| `delinquency_par_secure_all_dates.sql` |
`delinquency_par_secure_all_dates` | All dates | PAR trend charts (full
history). RLS-filtered by `current_username()`. |
| `delinquency_par_secure_latest.sql` | `delinquency_par_secure_latest` |
Latest date only | KPI cards, pie charts, bar charts, summary table. |
| `portfolio_health_secure_all_dates.sql` |
`portfolio_health_secure_all_dates` | All dates | Portfolio trend charts. |
| `portfolio_health_secure_latest.sql` | `portfolio_health_secure_latest` |
Latest date only | Portfolio KPI cards and breakdown charts. |
| `repayment_behavior_secure_all_dates.sql` |
`repayment_behavior_secure_all_dates` | All dates | Collection efficiency and
repayment trend charts. |
| `repayment_behavior_secure_latest.sql` |
`repayment_behavior_secure_latest` | Latest date only | Repayment KPI cards and
summary table. |
### Dashboard Metadata (`superset/dashboards/`)
| File | Description |
|------|-------------|
| `superset/dashboards/delinquency_par_dashboard.json` | Dashboard title,
slug, and published state for the Delinquency and PAR dashboard. |
| `superset/dashboards/portfolio_health_dashboard.json` | Dashboard title,
slug, and published state for the Portfolio Health dashboard. |
| `superset/dashboards/repayment_behavior_dashboard.json` | Dashboard title,
slug, and published state for the Repayment Behavior dashboard. |
### Infrastructure Updates
| File | Change |
|------|--------|
| `compose.yaml` | Adds `superset` service (port 8088), `dbt` service, and
`superset_home` Docker volume. Both new services depend on `warehouse:
healthy`. |
| `.env.example` | Adds `SUPERSET_SECRET_KEY`, `SUPERSET_ADMIN_*`,
`SUPERSET_NORTH_MANAGER_PASSWORD`, `SUPERSET_SOUTH_MANAGER_PASSWORD`,
`SUPERSET_METADATA_URI`, `SUPERSET_WAREHOUSE_URI`. |
| `.gitignore` | Adds `docker/superset/__pycache__/` exclusion. |
| `.rat-excludes` | Adds `**/docker/superset/__pycache__/**` and
`**/superset/dashboards/*.json` exclusions. Dashboard JSON files have no
standard comment syntax for Apache license headers. |
---
## Row-Level Security (RLS)
Every virtual dataset enforces row-level security at the SQL layer using
Jinja2 template processing. No separate Superset RLS rules are needed the
filter lives inside the dataset SQL itself.
### User → Office mapping
| Superset Username | Role | Offices Visible |
|---|---|---|
| `admin` | `ADMIN` | All branches |
| `north_manager` | Branch Manager | North Branch only |
| `south_manager` | Branch Manager | South Branch only |
The `ENABLE_TEMPLATE_PROCESSING: True` flag in `superset_config.py`
activates `current_username()` resolution. Without this flag, the Jinja2
template is treated as a literal string and the RLS WHERE clause never executes.
---
## Bootstrap System
The bootstrap is fully automated and idempotent. Every entity uses upsert
semantics running it twice does not create duplicates.
### Startup sequence
```
docker compose up -d
│
▼
init_superset.sh
├── superset db upgrade (apply Superset metadata migrations)
├── superset fab create-admin (create admin user)
├── superset init (seed default roles and permissions)
├── bootstrap_superset_security.py
│ ├── Create north_manager user (Alpha role)
│ ├── Create south_manager user (Alpha role)
│ └── INSERT meta.user_office_mapping rows
└── bootstrap_superset_assets.py
├── ensure_database() (register warehouse DB connection)
├── ensure_dataset() × 6 (create/update virtual datasets + columns
+ metrics)
├── ensure_chart() × 34 (create/update all chart configurations)
└── ensure_dashboard() × 3 (create/update dashboards with layout)
```
### Key bootstrap functions
**`ensure_dataset(database, name, sql, owner, dttm_col, columns, metrics)`**
Creates or updates a virtual dataset. Rebuilds all `TableColumn` and
`SqlMetric` rows on every run guarantees column definitions stay in sync with
the bootstrap code.
**`ensure_chart(name, viz_type, datasource, params, owner)`**
Creates or updates a chart. Stores the full form data as `params` JSON
including `datasource`, `viz_type`, `number_format`, `metric`, `adhoc_filters`,
and `subheader`. Clears `query_context` on update to prevent stale cache.
**`ensure_dashboard(meta_filename, owner, charts, row_specs)`**
Creates or updates a dashboard. Calls `build_layout()` to generate the
Superset `position_json` grid layout from the `row_specs` list, then assigns
all charts to the dashboard.
**`mart_exists(table_name)`**
Queries `information_schema.tables` to check if a mart table exists before
bootstrapping that dashboard. Allows `docker compose up` to succeed even when
only the delinquency mart is populated Portfolio Health and Repayment Behavior
dashboards are skipped gracefully with an INFO message.
---
## Delinquency and PAR Dashboard : Chart Reference
This is the primary dashboard. It contains 11 charts across 5 rows, all
sourcing from `mart_delinquency_par` via the two RLS-secured datasets.
### Row 1 : KPI Cards
The four summary cards show the portfolio health state as of the latest COB
snapshot date.
<img width="1891" height="278" alt="image"
src="https://github.com/user-attachments/assets/4a209b08-c0c6-4dcf-a860-2aaba95dc7bf"
/>
---
#### Chart 1 : PAR 30 KPI
**Type**: Big Number
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1` (All Portfolio rollup row)
**Formula**:
```
PAR 30 = SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)
```
`par_30_amount` = total outstanding principal of all loans where
`min_age_days >= 30` (i.e., loans overdue by 30 or more days). This is
cumulative it includes loans in PAR 60 and PAR 90 buckets.
`total_portfolio_amount` = SUM of principal outstanding across all active loans.
**Subheader**: "Portfolio at Risk > 30 days"
**Format**: `.2%` → displays as `58.14%`
<img width="394" height="160" alt="par-30-kpi-2026-06-10T08-07-01 182Z"
src="https://github.com/user-attachments/assets/7d43d8cf-2182-48e2-8226-517d8cdd0465"
/>
---
#### Chart 2 : PAR 60 KPI
**Type**: Big Number
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Formula**:
```
PAR 60 = SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)
```
`par_60_amount` = outstanding principal of loans overdue 60 or more days.
Always ≤ PAR 30 (cumulative definition).
**Subheader**: "Portfolio at Risk > 60 days"
**Format**: `.2%`
<img width="394" height="160" alt="par-60-kpi-2026-06-10T08-07-07 034Z"
src="https://github.com/user-attachments/assets/7455ac03-e048-467d-8e07-08f76d89cf90"
/>
---
#### Chart 3 : PAR 90 KPI
**Type**: Big Number
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Formula**:
```
PAR 90 = SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)
```
`par_90_amount` = outstanding principal of loans overdue 90 or more days.
The most severe tier. Always ≤ PAR 60 ≤ PAR 30.
**Subheader**: "Portfolio at Risk > 90 days"
**Format**: `.2%`
<img width="394" height="160" alt="par-90-kpi-2026-06-10T08-07-11 215Z"
src="https://github.com/user-attachments/assets/e7855091-c199-4911-84df-33c595ad982f"
/>
---
#### Chart 4 : NPA Exposure KPI
**Type**: Big Number
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Formula**:
```
NPA Exposure = SUM(npa_amount)
```
`npa_amount` = total outstanding principal of loans where `is_npa = true`.
Unlike PAR, this is an **absolute currency amount**, not a ratio. `is_npa` is
set by Fineract's COB when a loan exceeds the `overdue_days_for_npa` threshold
on the loan product configuration.
**Subheader**: "Non-Performing Asset Exposure"
**Format**: `$,.0f` → displays as `$36,252`
<img width="394" height="160" alt="npa-exposure-kpi-2026-06-10T08-07-15
299Z"
src="https://github.com/user-attachments/assets/29d55452-5b55-4117-9b1b-6b0c09710e86"
/>
---
### Row 2 : Trend Charts
Both trend charts use `delinquency_par_secure_all_dates` (full history
dataset) with `bucket_key = -1` filter, plotting one data point per
`snapshot_date`.
<img width="1874" height="397" alt="image"
src="https://github.com/user-attachments/assets/cdc65fbc-4d3b-4c83-823a-fab33b6b596f"
/>
---
#### Chart 5 : PAR Trend Line
**Type**: Line chart
**Dataset**: `delinquency_par_secure_all_dates`
**Filter**: `bucket_key = -1`
**X-axis**: `snapshot_date` (grain: `P1D` — daily)
**Three metrics plotted**:
| Line | Formula | Colour |
|------|---------|--------|
| PAR 30 | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
Blue |
| PAR 60 | `SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
Red |
| PAR 90 | `SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
Orange |
Shows how the **ratio** of at-risk portfolio has evolved over time. A rising
PAR 30 line with a lagging PAR 90 line indicates loans entering delinquency
faster than they are reaching the severe tier the gap between lines is the
"aging pipeline". When PAR 90 approaches PAR 30, most at-risk loans are
severely delinquent.
**Y-axis format**: `.1%`
**Row limit**: 5000
<img width="804" height="336" alt="par-trend-line-2026-06-10T08-14-45 365Z"
src="https://github.com/user-attachments/assets/eb4b3755-c426-4b75-a75c-7409f290470e"
/>
---
#### Chart 6 : At-Risk Outstanding Trend
**Type**: Stacked area chart
**Dataset**: `delinquency_par_secure_all_dates`
**Filter**: `bucket_key = -1`
**X-axis**: `snapshot_date`
**Three metrics stacked**:
| Band | Formula |
|------|---------|
| PAR 30 Amt | `SUM(par_30_amount)` |
| PAR 60 Amt | `SUM(par_60_amount)` |
| PAR 90 Amt | `SUM(par_90_amount)` |
Shows the **absolute currency exposure** at risk over time — no division.
Complements Chart 5: if the portfolio is growing fast and delinquency stays
proportionally flat, Chart 5 (ratio) is flat while Chart 6 (absolute) grows.
The area fill makes it easy to see the raw money at stake regardless of
portfolio size changes.
**Y-axis format**: `$,.0f`
**Stacked style**: `stack`
<img width="804" height="336"
alt="at-risk-outstanding-trend-2026-06-10T08-14-52 460Z"
src="https://github.com/user-attachments/assets/f4e5a98f-2ed1-4fac-b246-255ab997b9a6"
/>
---
### Row 3 : Distribution Pie Charts
Both pie charts use `delinquency_par_secure_latest` with `bucket_key != -1`
to get the per-bucket rows (one row per delinquency band), not the portfolio
rollup row.
<img width="1867" height="359" alt="image"
src="https://github.com/user-attachments/assets/cf3799f3-edb2-4b7a-82b7-efa12635ce47"
/>
---
#### Chart 7 : Delinquency Bucket Distribution
**Type**: Pie chart
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key != -1`
**Formula**:
```
Slice size = SUM(bucket_outstanding_amount)
Grouped by: bucket_name
```
`bucket_outstanding_amount` = SUM of `principal_outstanding` for loans in
that specific bucket on the latest snapshot date. Shows what **share of
outstanding principal** sits in each delinquency band. The "Current" slice
(`bucket_key = 0`) represents loans with no active delinquency tag — fully
on-schedule borrowers.
**Format**: `$,.0f`
| Bucket | bucket_key | Meaning |
|--------|-----------|---------|
| Current | 0 | No active delinquency tag — on-schedule |
| Watch-list | 1 | 1–29 days overdue |
| PAR 30-59 | 2 | 30–59 days overdue |
| PAR 60-89 | 3 | 60–89 days overdue |
| PAR 90+ | 4 | 90+ days overdue |
<img width="804" height="288"
alt="delinquency-bucket-distribution-2026-06-10T08-51-51 375Z"
src="https://github.com/user-attachments/assets/206267fd-306c-4440-8801-3dac610c46fc"
/>
---
#### Chart 8 : At-Risk vs Current Loans
**Type**: Pie chart
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key != -1`
**Formula**:
```
Slice size = SUM(bucket_loan_count)
Grouped by: bucket_name
```
Same breakdown as Chart 7 but by **loan count** not outstanding amount.
Answers "how many accounts are at risk?" rather than "how much money is at
risk?". The two pies tell different stories a portfolio with two large
delinquent loans shows a large amount slice but a small count slice. A
portfolio with many small delinquent loans shows the opposite.
**Format**: `,d`
<img width="804" height="288"
alt="at-risk-vs-current-loans-2026-06-10T08-52-00 152Z"
src="https://github.com/user-attachments/assets/2f2bd8c5-297c-49d1-87c5-899b5b2af6cf"
/>
---
### Row 4 : Branch and Product Analysis
Both bar charts use `delinquency_par_secure_latest` with `bucket_key = -1`,
grouped by a dimension column. Shows how delinquency is distributed across
business dimensions.
<img width="1872" height="331" alt="image"
src="https://github.com/user-attachments/assets/4278ad9d-3759-49f8-9cab-cac3cc9309e3"
/>
---
#### Chart 9 : PAR by Branch
**Type**: Bar chart
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Group by**: `office_name`
**Two metrics per branch**:
| Bar | Formula |
|-----|---------|
| PAR 30 (blue) | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount),
0)` |
| PAR 90 (orange) | `SUM(par_90_amount) /
NULLIF(SUM(total_portfolio_amount), 0)` |
`total_portfolio_amount` is aggregated per office, so each branch's PAR
ratio is self-contained. A branch with PAR 30 ≈ PAR 90 has most of its
delinquent portfolio severely overdue (aged delinquency). A branch with large
PAR 30 but small PAR 90 has fresh delinquency that has not yet aged.
**Y-axis format**: `.1%`
<img width="804" height="288" alt="par-by-branch-2026-06-10T08-52-23 029Z"
src="https://github.com/user-attachments/assets/c8919bcf-7163-4b87-92d9-aa0462364508"
/>
---
#### Chart 10 : PAR by Product
**Type**: Bar chart
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Group by**: `product_name`
**Two metrics per product** (same formulas as Chart 9).
Identifies which loan products are generating the most delinquency. A
product with consistently high PAR may have poor underwriting criteria,
aggressive repayment terms, or a structural borrower cohort risk. Enables
product-level portfolio management decisions.
**Y-axis format**: `.1%`
<img width="804" height="288" alt="par-by-product-2026-06-10T08-52-27 816Z"
src="https://github.com/user-attachments/assets/37d06ff9-d6e0-4120-9290-65791ab707f3"
/>
---
### Row 5 : Summary Table
---
#### Chart 11 : PAR Summary Table
**Type**: Table
**Dataset**: `delinquency_par_secure_latest`
**Filter**: `bucket_key = -1`
**Group by**: `office_name`, `product_name`
**Seven columns**:
| Column | Formula | Description |
|--------|---------|-------------|
| Portfolio | `SUM(total_portfolio_amount)` | Total outstanding principal
for this office × product |
| PAR 30 | `SUM(par_30_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
PAR 30 ratio |
| PAR 60 | `SUM(par_60_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
PAR 60 ratio |
| PAR 90 | `SUM(par_90_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
PAR 90 ratio |
| NPA Loans | `SUM(npa_loan_count)` | Count of NPA loan accounts |
| NPA Ratio | `SUM(npa_amount) / NULLIF(SUM(total_portfolio_amount), 0)` |
NPA exposure ratio |
The finest-grain diagnostic chart. Identifies the exact office × product
combination driving delinquency — the starting point for branch manager
investigation.
<img width="1625" height="288" alt="par-summary-table-2026-06-10T08-54-16
768Z"
src="https://github.com/user-attachments/assets/3bec0047-c8f9-4580-a73e-60cd3d613e9d"
/>
---
## Delinquency and PAR Dashboard
<img width="1657" height="1472"
alt="delinquency-and-par-dashboard-2026-06-10T08-54-48 617Z"
src="https://github.com/user-attachments/assets/3a4d8ddd-58bc-4ad4-afed-45376c4aade0"
/>
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]