[GitHub] [skywalking] kezhenxu94 commented on a change in pull request #8705: Add `Column.shardingKeyIdx` for column definition for BanyanDB

GitBox Fri, 18 Mar 2022 08:20:26 -0700


kezhenxu94 commented on a change in pull request #8705:
URL: https://github.com/apache/skywalking/pull/8705#discussion_r830085837




##########
File path: CHANGES.md
##########
@@ -112,14 +113,28 @@ Release Notes.
   , `SW_CORE_REST_JETTY_DELTA`).
 * [Breaking Change] Remove configuration `graphql/path` (env var: 
`SW_QUERY_GRAPHQL_PATH`).
 * Add storage column attribute `indexOnly`, support ElasticSearch only index 
and not store some fields.
-* Add `indexOnly=true` to `SegmentRecord.tags`, `AlarmRecord.tags`, 
`AbstractLogRecord.tags`, to reduce unnecessary storage.
+* Add `indexOnly=true` to `SegmentRecord.tags`, `AlarmRecord.tags`, 
`AbstractLogRecord.tags`, to reduce unnecessary
+  storage.
 * [Breaking Change] Remove configuration `restMinThreads` (env var: 
`SW_CORE_REST_JETTY_MIN_THREADS`
   , `SW_RECEIVER_SHARING_JETTY_MIN_THREADS`).
 * Refactor the core Builder mechanism, new storage plugin could implement 
their own converter and get rid of hard
   requirement of using HashMap to communicate between data object and database 
native structure.
 * [Breaking Change] Break all existing 3rd-party storage extensions.
 * Remove hard requirement of BASE64 encoding for binary field.
 * Add complexity limitation for GraphQL query to avoid malicious query.
+* Add `Column.shardingKeyIdx` for column definition for BanyanDB.
+
+```
+Sharding key is used to group time series data per metric of one entity in one 
place (same sharding or same 
+column for column-oriented database).
+For example,
+ServiceA's traffic gauge, service call per minute, includes following 
timestamp values, then it should be sharded by service ID
+[ServiceA(encoded ID): 01-28 18:30 values-1, 01-28 18:31 values-2, 01-28 18:32 
values-3, 01-28 18:32 values-4]
+
+BanyanDB is the 1st storage implementation supporting this. It would make 
continuous time series metrics stored closely and compressed better.
+
+NOTICE, this sharding concept is NOT just for splitting data into different 
database instances or physical files.

Review comment:
       You keep using the term `shard` but explaining it's actually for 
grouping, what's the reason not to just use name like `Column.groupKeyIdx`?

##########
File path: CHANGES.md
##########
@@ -112,14 +113,28 @@ Release Notes.
   , `SW_CORE_REST_JETTY_DELTA`).
 * [Breaking Change] Remove configuration `graphql/path` (env var: 
`SW_QUERY_GRAPHQL_PATH`).
 * Add storage column attribute `indexOnly`, support ElasticSearch only index 
and not store some fields.
-* Add `indexOnly=true` to `SegmentRecord.tags`, `AlarmRecord.tags`, 
`AbstractLogRecord.tags`, to reduce unnecessary storage.
+* Add `indexOnly=true` to `SegmentRecord.tags`, `AlarmRecord.tags`, 
`AbstractLogRecord.tags`, to reduce unnecessary
+  storage.
 * [Breaking Change] Remove configuration `restMinThreads` (env var: 
`SW_CORE_REST_JETTY_MIN_THREADS`
   , `SW_RECEIVER_SHARING_JETTY_MIN_THREADS`).
 * Refactor the core Builder mechanism, new storage plugin could implement 
their own converter and get rid of hard
   requirement of using HashMap to communicate between data object and database 
native structure.
 * [Breaking Change] Break all existing 3rd-party storage extensions.
 * Remove hard requirement of BASE64 encoding for binary field.
 * Add complexity limitation for GraphQL query to avoid malicious query.
+* Add `Column.shardingKeyIdx` for column definition for BanyanDB.
+
+```
+Sharding key is used to group time series data per metric of one entity in one 
place (same sharding or same 
+column for column-oriented database).
+For example,
+ServiceA's traffic gauge, service call per minute, includes following 
timestamp values, then it should be sharded by service ID
+[ServiceA(encoded ID): 01-28 18:30 values-1, 01-28 18:31 values-2, 01-28 18:32 
values-3, 01-28 18:32 values-4]
+
+BanyanDB is the 1st storage implementation supporting this. It would make 
continuous time series metrics stored closely and compressed better.
+
+NOTICE, this sharding concept is NOT just for splitting data into different 
database instances or physical files.

Review comment:
       You keep using the term `shard` but explaining it's actually for 
grouping, what's the reason not to just use name like `Column.groupKeyIdx`? 
This is really confusing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [skywalking] kezhenxu94 commented on a change in pull request #8705: Add `Column.shardingKeyIdx` for column definition for BanyanDB

Reply via email to