Re: [PR] Add document for column-based storage [skywalking-banyandb]

via GitHub Mon, 22 Apr 2024 22:01:33 -0700


Superskyyy commented on code in PR #435:
URL: 
https://github.com/apache/skywalking-banyandb/pull/435#discussion_r1575638980



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.

Review Comment:
   ```suggestion
   Within each shard, data is stored in different segments based on time 
ranges. The series indexes are generated based on entities, and the indexes 
generated based on indexing rules of the `Measure` types are also stored under 
the shard.
   ```



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.
+
+![shard](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/shard.png)
+
+## Segment
+
+Each segment is composed of multiple parts. Whenever SkyWalking sends a batch 
of data, BanyanDB writes this batch of data into a new Part. For data of the 
`Stream` type, the inverted indexes generated based on the indexing rules are 
also stored in the segment. Since BanyanDB adopts a snapshot approach for data 
read and write operations, the segment also needs to maintain additional 
snapshot information to record the validity of the parts.
+
+![segment](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/segment.png)
+
+## Part
+
+Within a part, data is split into multiple files in a columnar manner. The 
timestamps are stored in the `timestamps.bin` file, tags are organized in 
persistent tag families as various files with the `.tf` suffix, and fields are 
stored separately in the `fields.bin` file. 
 
-[shard](https://skywalking.apache.org/doc-graph/banyandb/v0.4.0/tsdb-shard.png)
+In addition, each part maintains several metadata files. Among them, 
`metadata.json` is the metadata file for the part, storing descriptive 
information, such as start and end times, part size, etc. 
 
-* Buffer: It is typically implemented as an in-memory queue managed by a 
shard. When new time-series data is ingested into the system, it is added to 
the end of the queue, and when the buffer reaches a specific size, the data is 
flushed to disk in batches.
-* SST: When a bucket of buffer becomes full or reaches a certain size 
threshold, it is flushed to disk as a new Sorted String Table (SST) file. This 
process is known as compaction.
-* Segments and Blocks: Time-series data is stored in data segments/blocks 
within each shard. Blocks contain a fixed number of data points and are 
organized into time windows. Each data segment includes an index that 
efficiently retrieves data within the block.
-* Block Cache: It manages the in-memory cache of data blocks, improving query 
performance by caching frequently accessed data blocks in memory.
+The `meta.bin` is a skipping index file serves as the entry file for the 
entire part, helping to index the `primary.bin` file. 
+
+The `primary.bin` file contains the index of each block. Through it, the 
actual data files or the tagFamily metadata files ending with `.tfm` can be 
indexed, which in turn helps locate the data in blocks. 

Review Comment:
   ```suggestion
   The `primary.bin` file contains the index of each block. Through it, the 
actual data files or the tagFamily metadata files ending with `.tfm` can be 
indexed, which in turn helps locating the data in blocks. 
   ```



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.
+
+![shard](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/shard.png)
+
+## Segment
+
+Each segment is composed of multiple parts. Whenever SkyWalking sends a batch 
of data, BanyanDB writes this batch of data into a new Part. For data of the 
`Stream` type, the inverted indexes generated based on the indexing rules are 
also stored in the segment. Since BanyanDB adopts a snapshot approach for data 
read and write operations, the segment also needs to maintain additional 
snapshot information to record the validity of the parts.
+
+![segment](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/segment.png)
+
+## Part
+
+Within a part, data is split into multiple files in a columnar manner. The 
timestamps are stored in the `timestamps.bin` file, tags are organized in 
persistent tag families as various files with the `.tf` suffix, and fields are 
stored separately in the `fields.bin` file. 
 
-[shard](https://skywalking.apache.org/doc-graph/banyandb/v0.4.0/tsdb-shard.png)
+In addition, each part maintains several metadata files. Among them, 
`metadata.json` is the metadata file for the part, storing descriptive 
information, such as start and end times, part size, etc. 
 
-* Buffer: It is typically implemented as an in-memory queue managed by a 
shard. When new time-series data is ingested into the system, it is added to 
the end of the queue, and when the buffer reaches a specific size, the data is 
flushed to disk in batches.
-* SST: When a bucket of buffer becomes full or reaches a certain size 
threshold, it is flushed to disk as a new Sorted String Table (SST) file. This 
process is known as compaction.
-* Segments and Blocks: Time-series data is stored in data segments/blocks 
within each shard. Blocks contain a fixed number of data points and are 
organized into time windows. Each data segment includes an index that 
efficiently retrieves data within the block.
-* Block Cache: It manages the in-memory cache of data blocks, improving query 
performance by caching frequently accessed data blocks in memory.
+The `meta.bin` is a skipping index file serves as the entry file for the 
entire part, helping to index the `primary.bin` file. 
+
+The `primary.bin` file contains the index of each block. Through it, the 
actual data files or the tagFamily metadata files ending with `.tfm` can be 
indexed, which in turn helps locate the data in blocks. 
+
+Notably, for data of the `Stream` type, since there are no field columns, the 
`fields.bin` file does not exist, while the rest of the structure is entirely 
consistent with the `Measure` type.
+
+![measure-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-part.png)
+![stream-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-part.png)
+
+## Block
+
+The diagram below shows the detailed fields within each block. The block is 
the minimal unit of tsdb, which contains several rows of data. Due to the 
column-based design, each block is spread over several files.
+
+![measure-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-block.png)
+![stream-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-block.png)
 
 ## Write Path
 
 The write path of TSDB begins when time-series data is ingested into the 
system. TSDB will consult the schema repository to check if the group exists, 
and if it does, then it will hash the SeriesID to determine which shard it 
belongs to.
 
-Each shard in TSDB is responsible for storing a subset of the time-series 
data, and it uses a write-ahead log to record incoming writes in a durable and 
fault-tolerant manner. The shard also holds an in-memory index allowing fast 
lookups of time-series data.
+Each shard in TSDB is responsible for storing a subset of the time-series 
data. The shard also holds an in-memory index allowing fast lookups of 
time-series data.
 
-When a shard receives a write request, the data is written to the buffer as a 
series of buckets. Each bucket is a fixed-size chunk of time-series data 
typically configured to be several minutes or hours long. As new data is 
written to the buffer, it is appended to the current bucket until it is full. 
Once the bucket is full, it is closed, and a new bucket is created to continue 
buffering writes.
+When a shard receives a write request, the data is written to the buffer as a 
memory part and the series index and inverted index will also be updated. The 
worker in the background periodically flushes data, writing the memory part to 
the disk. After the flush operation is completed, it triggers a merge operation 
to combine the parts and remove invalid data. 
 
-Once a bucket is closed, it is stored as a single SST in a shard. The file is 
indexed and added to the index for the corresponding time range and resolution.
+Whenever a new memory part is generated or a flush and merge operation is 
triggered, it initiates an update of the snapshot and deletes outdated 
snapshots.
 
 ## Read Path
 
-The read path in TSDB retrieves time-series data from disk or memory and 
returns it to the query engine. The read path comprises several components: the 
buffer, cache, and SST file. The following is a high-level overview of how 
these components work together to retrieve time-series data in TSDB.
-
-The first step in the read path is to perform an index lookup to determine 
which blocks contain the desired time range. The index contains metadata about 
each data block, including its start and end time and its location on disk.
+The read path in TSDB retrieves time-series data from disk or memory and 
returns it to the query engine. The read path comprises several components: the 
buffer and parts. The following is a high-level overview of how these 
components work together to retrieve time-series data in TSDB.

Review Comment:
   ```suggestion
   The read path in TSDB retrieves time-series data from disk or memory, and 
returns it to the query engine. The read path comprises several components: the 
buffer and parts. The following is a high-level overview of how these 
components work together to retrieve time-series data in TSDB.
   ```



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.
+
+![shard](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/shard.png)
+
+## Segment
+
+Each segment is composed of multiple parts. Whenever SkyWalking sends a batch 
of data, BanyanDB writes this batch of data into a new Part. For data of the 
`Stream` type, the inverted indexes generated based on the indexing rules are 
also stored in the segment. Since BanyanDB adopts a snapshot approach for data 
read and write operations, the segment also needs to maintain additional 
snapshot information to record the validity of the parts.

Review Comment:
   Let's be consistent with the capitalization of "Part" or define it first 
before segment.



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.
+
+![shard](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/shard.png)
+
+## Segment
+
+Each segment is composed of multiple parts. Whenever SkyWalking sends a batch 
of data, BanyanDB writes this batch of data into a new Part. For data of the 
`Stream` type, the inverted indexes generated based on the indexing rules are 
also stored in the segment. Since BanyanDB adopts a snapshot approach for data 
read and write operations, the segment also needs to maintain additional 
snapshot information to record the validity of the parts.
+
+![segment](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/segment.png)
+
+## Part
+
+Within a part, data is split into multiple files in a columnar manner. The 
timestamps are stored in the `timestamps.bin` file, tags are organized in 
persistent tag families as various files with the `.tf` suffix, and fields are 
stored separately in the `fields.bin` file. 
 
-[shard](https://skywalking.apache.org/doc-graph/banyandb/v0.4.0/tsdb-shard.png)
+In addition, each part maintains several metadata files. Among them, 
`metadata.json` is the metadata file for the part, storing descriptive 
information, such as start and end times, part size, etc. 
 
-* Buffer: It is typically implemented as an in-memory queue managed by a 
shard. When new time-series data is ingested into the system, it is added to 
the end of the queue, and when the buffer reaches a specific size, the data is 
flushed to disk in batches.
-* SST: When a bucket of buffer becomes full or reaches a certain size 
threshold, it is flushed to disk as a new Sorted String Table (SST) file. This 
process is known as compaction.
-* Segments and Blocks: Time-series data is stored in data segments/blocks 
within each shard. Blocks contain a fixed number of data points and are 
organized into time windows. Each data segment includes an index that 
efficiently retrieves data within the block.
-* Block Cache: It manages the in-memory cache of data blocks, improving query 
performance by caching frequently accessed data blocks in memory.
+The `meta.bin` is a skipping index file serves as the entry file for the 
entire part, helping to index the `primary.bin` file. 
+
+The `primary.bin` file contains the index of each block. Through it, the 
actual data files or the tagFamily metadata files ending with `.tfm` can be 
indexed, which in turn helps locate the data in blocks. 
+
+Notably, for data of the `Stream` type, since there are no field columns, the 
`fields.bin` file does not exist, while the rest of the structure is entirely 
consistent with the `Measure` type.
+
+![measure-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-part.png)
+![stream-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-part.png)
+
+## Block
+
+The diagram below shows the detailed fields within each block. The block is 
the minimal unit of tsdb, which contains several rows of data. Due to the 
column-based design, each block is spread over several files.
+
+![measure-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-block.png)
+![stream-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-block.png)
 
 ## Write Path
 
 The write path of TSDB begins when time-series data is ingested into the 
system. TSDB will consult the schema repository to check if the group exists, 
and if it does, then it will hash the SeriesID to determine which shard it 
belongs to.
 
-Each shard in TSDB is responsible for storing a subset of the time-series 
data, and it uses a write-ahead log to record incoming writes in a durable and 
fault-tolerant manner. The shard also holds an in-memory index allowing fast 
lookups of time-series data.
+Each shard in TSDB is responsible for storing a subset of the time-series 
data. The shard also holds an in-memory index allowing fast lookups of 
time-series data.
 
-When a shard receives a write request, the data is written to the buffer as a 
series of buckets. Each bucket is a fixed-size chunk of time-series data 
typically configured to be several minutes or hours long. As new data is 
written to the buffer, it is appended to the current bucket until it is full. 
Once the bucket is full, it is closed, and a new bucket is created to continue 
buffering writes.
+When a shard receives a write request, the data is written to the buffer as a 
memory part and the series index and inverted index will also be updated. The 
worker in the background periodically flushes data, writing the memory part to 
the disk. After the flush operation is completed, it triggers a merge operation 
to combine the parts and remove invalid data. 

Review Comment:
   ```suggestion
   When a shard receives a write request, the data is written to the buffer as 
a memory part. Meanwhile, the series index and inverted index will also be 
updated. The worker in the background periodically flushes data, writing the 
memory part to the disk. After the flush operation is completed, it triggers a 
merge operation to combine the parts and remove invalid data. 
   ```



##########
docs/concept/tsdb.md:
##########
@@ -2,37 +2,58 @@
 
 TSDB is a time-series storage engine designed to store and query large volumes 
of time-series data. One of the key features of TSDB is its ability to 
automatically manage data storage over time, optimize performance and ensure 
that the system can scale to handle large workloads. TSDB empowers `Measure` 
and `Stream` relevant data.
 
-
 ## Shard
 
 In TSDB, the data in a group is partitioned into shards based on a 
configurable sharding scheme. Each shard is assigned to a specific set of 
storage nodes, and those nodes store and process the data within that shard. 
This allows BanyanDB to scale horizontally by adding more storage nodes to the 
cluster as needed.
 
+Within each shard, data is stored in different segments based on time ranges. 
The series index generated based on entities, and the indexes generated based 
on indexing rules of the `Measure` types are also stored under the shard.
+
+![shard](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/shard.png)
+
+## Segment
+
+Each segment is composed of multiple parts. Whenever SkyWalking sends a batch 
of data, BanyanDB writes this batch of data into a new Part. For data of the 
`Stream` type, the inverted indexes generated based on the indexing rules are 
also stored in the segment. Since BanyanDB adopts a snapshot approach for data 
read and write operations, the segment also needs to maintain additional 
snapshot information to record the validity of the parts.
+
+![segment](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/segment.png)
+
+## Part
+
+Within a part, data is split into multiple files in a columnar manner. The 
timestamps are stored in the `timestamps.bin` file, tags are organized in 
persistent tag families as various files with the `.tf` suffix, and fields are 
stored separately in the `fields.bin` file. 
 
-[shard](https://skywalking.apache.org/doc-graph/banyandb/v0.4.0/tsdb-shard.png)
+In addition, each part maintains several metadata files. Among them, 
`metadata.json` is the metadata file for the part, storing descriptive 
information, such as start and end times, part size, etc. 
 
-* Buffer: It is typically implemented as an in-memory queue managed by a 
shard. When new time-series data is ingested into the system, it is added to 
the end of the queue, and when the buffer reaches a specific size, the data is 
flushed to disk in batches.
-* SST: When a bucket of buffer becomes full or reaches a certain size 
threshold, it is flushed to disk as a new Sorted String Table (SST) file. This 
process is known as compaction.
-* Segments and Blocks: Time-series data is stored in data segments/blocks 
within each shard. Blocks contain a fixed number of data points and are 
organized into time windows. Each data segment includes an index that 
efficiently retrieves data within the block.
-* Block Cache: It manages the in-memory cache of data blocks, improving query 
performance by caching frequently accessed data blocks in memory.
+The `meta.bin` is a skipping index file serves as the entry file for the 
entire part, helping to index the `primary.bin` file. 
+
+The `primary.bin` file contains the index of each block. Through it, the 
actual data files or the tagFamily metadata files ending with `.tfm` can be 
indexed, which in turn helps locate the data in blocks. 
+
+Notably, for data of the `Stream` type, since there are no field columns, the 
`fields.bin` file does not exist, while the rest of the structure is entirely 
consistent with the `Measure` type.
+
+![measure-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-part.png)
+![stream-part](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-part.png)
+
+## Block
+
+The diagram below shows the detailed fields within each block. The block is 
the minimal unit of tsdb, which contains several rows of data. Due to the 
column-based design, each block is spread over several files.
+
+![measure-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/measure-block.png)
+![stream-block](https://skywalking.apache.org/doc-graph/banyandb/v0.6.0/stream-block.png)
 
 ## Write Path
 
 The write path of TSDB begins when time-series data is ingested into the 
system. TSDB will consult the schema repository to check if the group exists, 
and if it does, then it will hash the SeriesID to determine which shard it 
belongs to.
 
-Each shard in TSDB is responsible for storing a subset of the time-series 
data, and it uses a write-ahead log to record incoming writes in a durable and 
fault-tolerant manner. The shard also holds an in-memory index allowing fast 
lookups of time-series data.
+Each shard in TSDB is responsible for storing a subset of the time-series 
data. The shard also holds an in-memory index allowing fast lookups of 
time-series data.
 
-When a shard receives a write request, the data is written to the buffer as a 
series of buckets. Each bucket is a fixed-size chunk of time-series data 
typically configured to be several minutes or hours long. As new data is 
written to the buffer, it is appended to the current bucket until it is full. 
Once the bucket is full, it is closed, and a new bucket is created to continue 
buffering writes.
+When a shard receives a write request, the data is written to the buffer as a 
memory part and the series index and inverted index will also be updated. The 
worker in the background periodically flushes data, writing the memory part to 
the disk. After the flush operation is completed, it triggers a merge operation 
to combine the parts and remove invalid data. 
 
-Once a bucket is closed, it is stored as a single SST in a shard. The file is 
indexed and added to the index for the corresponding time range and resolution.
+Whenever a new memory part is generated or a flush and merge operation is 
triggered, it initiates an update of the snapshot and deletes outdated 
snapshots.

Review Comment:
   ```suggestion
   Whenever a new memory part is generated, or when a flush and merge operation 
is triggered, they initiate an update of the snapshot and delete outdated 
snapshots.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add document for column-based storage [skywalking-banyandb]

Reply via email to