hanahmily commented on issue #13811: URL: https://github.com/apache/skywalking/issues/13811#issuecomment-4231777757
> 1. Shard tag and entity tag can differ, scattering the same entity across nodes. > > Entity and ShardingKey are independent fields in Measure. When ShardingKey is set, it overrides the shard routing from Entity. For example, with entity.tag_names=["service_id"] and sharding_key.tag_names=["instance_id"], data for the same service_id lands on different shards/nodes under different instance_id values. Each node only sees a partial view of that entity. > No, ShardingKey and Entity are not independent. ShardingKey aims to enhance topn streaming performance and must adhere to the rule that the same entity always maps to the same node. Refer to the example I mentioned at https://github.com/apache/skywalking/issues/12526. The OAP server follows the rule to set up the ShardingKey. Your insight inspired me to add a validation step to enforce this implicit rule. If the end user sets them as your example, it will cause an unexpected result. > 2. Even on a single node, agg=UNSPECIFIED still truncates incorrectly. > > The coordinator sends agg=AGGREGATION_FUNCTION_UNSPECIFIED to data nodes, which prevents proper aggregation. For a COUNT TopN with TopN=2, a node holding entity-A(5 points), entity-B(3 points), entity-C(1 point) cannot compute COUNT(entity-A)=5. It simply truncates raw results by the TopN limit, returning incorrect partial data. <head></head><h2 data-path-to-node="3" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">TopN Query Distribution and Sharding Logic</h2><p data-path-to-node="4" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font -family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">In BanyanDB, the current<span class="Apple-converted-space"> </span><b data-path-to-node="4" data-index-in-node="25" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">TopN query</b><span class="Apple-converted-space"> </span>implementation pushes the aggregation functions directly to the data nodes rather than pruning them.</p><h3 data-path-to-node="5" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans", sans-serif !important; line- height: 1.15 !important; margin-top: 0px !important;">1. Ad-hoc TopN Queries</h3><p data-path-to-node="6" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">During distributed analysis, the system determines whether to push down the logic based on the presence of an aggregate function:</p><response-element class="" ng-version="0.0.0-PLACEHOLDER" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent : 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><code-block _nghost-ng-c1583389803="" class="ng-tns-c1583389803-103 ng-star-inserted" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><div _ngcontent-ng-c1583389803="" class="code-block ng-tns-c1583389803-103 ng-animate-disabled ng-trigger ng-trigger-codeBlockRevealAnimation" jslog="223238;track:impression,attention;BardVeMetadataKey:[["r_e388c064eee99d0a","c_5825bdc60b46ed16",null,"rc_a80596458e72d6a2",null,null,"en",null,1,null,null,1,0]]" style="display: block; font-family: "Google Sans Text", sans-serif !important; line-heigh t: 1.15 !important; margin-top: 0px !important;"><div _ngcontent-ng-c1583389803="" class="code-block-decoration header-formatted gds-title-s ng-tns-c1583389803-103 ng-star-inserted" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><span _ngcontent-ng-c1583389803="" class="ng-tns-c1583389803-103" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Go</span><div _ngcontent-ng-c1583389803="" class="buttons ng-tns-c1583389803-103 ng-star-inserted" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><button _ngcontent-ng-c1583389803="" aria-label="Copy code" mat-icon-button="" mattooltip="Copy code" class="mdc-icon-button mat-mdc-icon-button mat-mdc-button-base mat-mdc-tooltip-trigger copy-button ng-tns-c1583389803-103 mat-unthemed ng-star-inserted" mat-ripple-loader-uninitialized="" mat-ripple-loader-class-name="mat-mdc-button-ripple" mat-ripple-loader-centered="" jslog="179062;track:generic_click,impression;BardVeMetadataKey:[["r_e388c064eee99d0a","c_5825bdc60b46ed16",null,"rc_a80596458e72d6a2",null,null,"en",null,1,null,null,1,0]];mutable:true" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><span class="mat-mdc-button-persistent-ripple mdc-icon-button__ripple" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"></span><mat-icon _ngcontent-ng-c1583389803="" role="img" fonticon="content_copy" class="mat-icon notranslate gds-icon-s google-symbols mat-ligature-font mat-icon-no-color" aria-hidden="true" data-mat-icon-type="font" data-mat-icon-name="content_copy" style="font-family: "Google Sans Text", sans-serif !impo rtant; line-height: 1.15 !important; margin-top: 0px !important;"></mat-icon><span class="mat-focus-indicator" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"></span><span class="mat-mdc-button-touch-target" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"></span></button></div></div><div _ngcontent-ng-c1583389803="" class="formatted-code-block-internal-container ng-tns-c1583389803-103" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><div _ngcontent-ng-c1583389803="" class="animated-opacity ng-tns-c1583389803-103" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><pre _ngcontent-ng-c1583389803="" class="ng-tns-c1583389803-103" style="font-family: "Goo gle Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><code _ngcontent-ng-c1583389803="" role="text" data-test-id="code-content" class="code-container formatted ng-tns-c1583389803-103" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><span class="hljs-comment" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">// DistributedAnalyze converts logical expressions into an executable </span> <span class="hljs-comment" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">// operation tree represented by a Plan.</span> <span class="hljs-function" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;"><span class="hljs-keyword" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">func</span> <span class="hljs-title" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">DistributedAnalyze</span><span class="hljs-params" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">(criteria *measurev1.QueryRequest, ss []logical.Schema)</span> <span class="hljs-params" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">(logical.Plan, error)</span></span> { <span class="hljs-comment" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">// ...</span> pushDownAgg := criteria.GetAgg() != <span class="hljs-literal" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">nil</span> plan := newUnresolvedDistributed(criteria, pushDownAgg) <span class="hljs-comment" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">// ...</span> } </code></pre></div></div></div></code-block></response-element><p data-path-to-node="8" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">If<span class="Apple-converted-space"> </span><code data-path-to-node="8" data-index-in-node="3" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">criteria.GetAgg()</code><span class="Apple-converted-space"> </span>is not nil, the aggregation function is pushed down to the data nodes for execution.</p><h3 da ta-path-to-node="9" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">2. Pre-calculated TopN Streaming</h3><p data-path-to-node="10" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Tex t", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">If you are referring to<span class="Apple-converted-space"> </span><b data-path-to-node="10" data-index-in-node="24" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">pre-calculated TopN streaming</b><span class="Apple-converted-space"> </span>rather than ad-hoc queries, the behavior relies on the<span class="Apple-converted-space"> </span><code data-path-to-node="10" data-index-in-node="109" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">ShardingKey</code>. To maintain high performance, BanyanDB ensures that all data for a specific entity resides on the same node.</p><h4 data-path-to-node="11" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Comparison: Sharding Scenarios</h4><p data-path-to-node="12" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Suppose we want to calculate<span class="Apple-converted-space">� �</span><b data-path-to-node="12" data-index-in-node="29" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Top 2</b><span class="Apple-converted-space"> </span>by<span class="Apple-converted-space"> </span><b data-path-to-node="12" data-index-in-node="38" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Count</b><span class="Apple-converted-space"> </span>for the entity set<span class="Apple-converted-space"> </span><code data-path-to-node="12" data-index-in-node="63" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Service + Instance</code>.</p> Scenario | Configuration | Data Distribution & Merging -- | -- | -- A | No ShardingKey | Node A returns ServiceA(Inst1:5, Inst3:3).Node B returns ServiceA(Inst2:6, Inst4:1).The Liaison node must merge these results to output: ServiceA(Inst2:6, Inst1:5). B | ShardingKey = Service | Node A contains all data for ServiceA and returns ServiceA(Inst2:6, Inst1:5)directly.Node B contains no data for ServiceA. <h3 data-path-to-node="14" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">Design Principle</h3><p data-path-to-node="15" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; font-family: "Google Sans Text" , sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">A core design principle of BanyanDB is to<span class="Apple-converted-space"> </span><b data-path-to-node="15" data-index-in-node="42" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">avoid distributing the same aggregation entity across different data nodes.</b>By ensuring an entity's data is localized to a single node via the<span class="Apple-converted-space"> </span><code data-path-to-node="15" data-index-in-node="185" style="font-family: "Google Sans Text", sans-serif !important; line-height: 1.15 !important; margin-top: 0px !important;">ShardingKey</code>, we eliminate unnecessary network overhead and coordinator-side merging, significantly improving performance.</p> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
