hanahmily commented on code in PR #892:
URL:
https://github.com/apache/skywalking-banyandb/pull/892#discussion_r2602257915
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
+ next int // Next write position in the circular buffer
+ values []float64 // Fixed-size buffer for metric values
+ n uint64 // Total number of values written (wraps around)
+}
+```
+- Stores metric values in a circular buffer
+- Implements circular overwrite behavior when buffer is full
+
+**`FlightRecorder`**
+```go
+type FlightRecorder struct {
+ nextMetricID MetricID // Next available metric ID
+ index map[string]MetricID // Map from metric key string to MetricID
+ metrics map[MetricID]*RingBuffer // Map from MetricID to RingBuffer
+ histograms map[string]metric.Histogram // Map for histogram metrics
Review Comment:
Each histogram only stores a single data point, not a series of points. You
should use the ring buffer as others.
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
+ next int // Next write position in the circular buffer
+ values []float64 // Fixed-size buffer for metric values
Review Comment:
How to store the histogram?
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
+ next int // Next write position in the circular buffer
+ values []float64 // Fixed-size buffer for metric values
+ n uint64 // Total number of values written (wraps around)
+}
+```
+- Stores metric values in a circular buffer
+- Implements circular overwrite behavior when buffer is full
+
+**`FlightRecorder`**
+```go
+type FlightRecorder struct {
Review Comment:
FlightRecorder should expand or shrink the ring buffer to comply with the
memory limits set by the flag.
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
+ next int // Next write position in the circular buffer
+ values []float64 // Fixed-size buffer for metric values
+ n uint64 // Total number of values written (wraps around)
+}
+```
+- Stores metric values in a circular buffer
+- Implements circular overwrite behavior when buffer is full
+
+**`FlightRecorder`**
+```go
+type FlightRecorder struct {
Review Comment:
The `FlightRecorder` calculates total memory usage including all components:
#### Components Included:
1. **Index Map Overhead**
- Map header: 48 bytes
- Per entry: key string size + 4 bytes (MetricID)
2. **Metrics Map Overhead**
- Map header: 48 bytes
- Per entry: RingBuffer size + 8 bytes (pointer overhead)
3. **Metadata Map Overhead**
- Map header: 48 bytes
- Per entry: metric name string + description string + 4 bytes (MetricID)
4. **String Storage**
- Each string: 16 bytes (Go string header) + actual string length
- Includes: metric names, descriptions, label keys and values
5. **Float64 Values**
- All values stored in RingBuffers: `capacity × 8 bytes` per metric
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
+ next int // Next write position in the circular buffer
+ values []float64 // Fixed-size buffer for metric values
+ n uint64 // Total number of values written (wraps around)
+}
+```
+- Stores metric values in a circular buffer
+- Implements circular overwrite behavior when buffer is full
+
+**`FlightRecorder`**
+```go
+type FlightRecorder struct {
+ nextMetricID MetricID // Next available metric ID
Review Comment:
What is the metric ID? Why can't the name be used as the key instead?
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
Review Comment:
We need a flag: `--max-metrics-memory-usage-percentage`
https://github.com/apache/skywalking-banyandb/blob/main/pkg/cgroups/memory.go#L36
##########
docs/design/fodc/watchdog-and-flight-recoder.md:
##########
@@ -0,0 +1,383 @@
+# Watchdog And Flight Recorder Development Design
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Component Design](#component-design)
+3. [Data Flow](#data-flow)
+4. [Testing Strategy](#testing-strategy)
+5. [Appendix](#appendix)
+
+## Overview
+
+The First Occurrence Data Collection (FODC) infrastructure consists of two
main components working together to ensure metrics data survivability in
BanyanDB:
+
+**Watchdog**: Periodically polls metrics from the BanyanDB container and
forwards them to the Flight Recorder for buffering.
+
+**Flight Recorder**: Buffers metrics data using fixed-size circular buffers
(RingBuffer) with in-memory storage, ensuring data persists even when the main
BanyanDB process crashes.
+
+Together, these components capture and preserve metrics data to ensure that
critical observability data is not lost during process crashes.
+
+### Responsibilities
+
+**Watchdog Component**
+- Polls metrics from BanyanDB at configurable intervals
+- Parses Prometheus text format metrics efficiently
+- Forwards collected metrics to Flight Recorder for buffering
+- Handles connection failures and retries gracefully
+- Monitors BanyanDB process health
+
+**Flight Recorder Component**
+- Maintains a fixed-size circular buffer (RingBuffer) per metric
+- Stores metrics in-memory to ensure fast access and persistence across
process crashes
+- Manages buffer capacity and handles overflow scenarios using circular
overwrite behavior
+- Ensures data integrity and prevents data loss during crashes
+
+### Component Interaction Flow
+
+```
+BanyanDB Metrics Endpoint
+ │
+ │ (HTTP GET /metrics)
+ ▼
+ Watchdog Component
+ │
+ │ (Poll at interval)
+ │
+ │ Parse Prometheus Format
+ │
+ │ Forward Metrics
+ ▼
+ Flight Recorder Component
+ │
+ │ Write to RingBuffer
+ │
+ │ (Per-metric buffers)
+ ▼
+ In-Memory Storage
+```
+
+## Component Design
+
+### 1. Watchdog Component
+
+**Purpose**: Periodically polls metrics from BanyanDB and forwards them to
Flight Recorder
+
+#### Core Responsibilities
+
+- **Metrics Polling**: Polls metrics from BanyanDB metrics endpoint at
configurable intervals
+- **Metrics Parsing**: Uses metrics package to parse Prometheus text format
efficiently
+- **Error Handling**: Implements exponential backoff for transient failures
+- **Health Monitoring**: Tracks BanyanDB process health and reports status
+
+#### Core Types
+
+**`Watchdog`**
+```go
+type Watchdog struct {
+ client *http.Client
+ url string
+ interval time.Duration
+}
+```
+
+#### Key Functions
+
+**`Start(ctx context.Context) error`**
+- Initializes the watchdog component
+- Starts polling loop with configurable interval
+- Sets up HTTP client with connection reuse
+- Begins periodic metrics collection
+
+**`Stop(ctx context.Context) error`**
+- Gracefully stops the polling loop
+- Closes HTTP connections
+- Ensures in-flight requests complete
+
+**`pollMetrics(ctx context.Context) ([]metrics.RawMetric, error)`**
+- Fetches raw metrics text from endpoint
+- Uses metrics package to parse Prometheus text format
+- Returns parsed metrics or error
+- Implements retry logic with exponential backoff
+
+#### Configuration Flags
+
+**`--poll-interval`**
+- **Type**: `duration`
+- **Default**: `10s`
+- **Description**: Interval at which the Watchdog polls metrics from the
BanyanDB container
+
+**`--metrics-endpoint`**
+- **Type**: `string`
+- **Default**: `http://localhost:2121/metrics`
+- **Description**: URL of the BanyanDB metrics endpoint to poll from
+
+### 2. Flight Recorder Component
+
+**Purpose**: Buffers metrics data using fixed-size circular buffers with
in-memory storage
+
+#### Core Responsibilities
+
+- **Metrics Buffering**: Maintains a fixed-size RingBuffer per metric
+- **Data Persistence**: Ensures metrics survive process crashes
+- **Overflow Handling**: Implements circular overwrite behavior when buffers
are full
+
+#### Core Types
+
+**`MetricID`**
+```go
+type MetricID uint32
+```
+- Unique identifier for each metric stored in the FlightRecorder
+- Auto-incremented for new metrics
+
+**`RingBuffer`**
+```go
+type RingBuffer struct {
Review Comment:
The `RingBuffer` component calculates its memory usage as:
```go
Size() = capacity × 8 bytes
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]