Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1851604&r1=1851603&r2=1851604&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Fri Jan 18 07:20:49 2019
@@ -19,11 +19,283 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Thu, 17 Jan 2019 18:33:24 -0800</pubDate>
-    <lastBuildDate>Thu, 17 Jan 2019 18:33:24 -0800</lastBuildDate>
+    <pubDate>Thu, 17 Jan 2019 23:11:23 -0800</pubDate>
+    <lastBuildDate>Thu, 17 Jan 2019 23:11:23 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>How Cisco&#39;s Big Data Team Improved the High Concurrent 
Throughput of Apache Kylin by 5x</title>
+        <description>&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
+
+&lt;p&gt;As part of the development group of Cisco’s Big Data team, one of 
our responsibilities is to provide BI reports to our stakeholders. Stakeholders 
rely on the reporting system to check the usage of Cisco’s business 
offerings. These reports are also used as a reference for billing, so they are 
critical to our stakeholders and the business overall.&lt;/p&gt;
+
+&lt;p&gt;The raw data for these reports is sourced across multiple tables in 
our Oracle database. The monthly data volume for one table is in the billions, 
and if a customer wants to run a report for one year, at least one billion to 
two billion rows of data need to be aggregated or processed through other 
operations. Additionally, all results need to be provided in a short amount of 
time. In the course of our research, we discovered Apache Kylin, a distributed 
preprocessing engine for massive datasets based on pre-calculation, which 
enables you to query those massive datasets at sub-second latency.&lt;/p&gt;
+
+&lt;p&gt;With the simulation test using our production data, we found that 
Kylin was ideal for our needs and was indeed capable of providing aggregated 
results on one billion rows of data in one second. However, we still needed to 
undergo additional tests for another use case. For one stakeholder, we provide 
15 charts displayed on a single page. The BI system will send REST API requests 
to Kylin to query the data for each chart asynchronously. Based on the 
production data volume, if there are 20 stakeholders viewing the report on one 
node, they will trigger 15*20 = 300 requests. The high concurrent query 
performance of Kylin is what we needed to test here.&lt;/p&gt;
+
+&lt;h2 id=&quot;the-testing-stage&quot;&gt;The Testing Stage&lt;/h2&gt;
+
+&lt;p&gt;&lt;strong&gt;Precondition&lt;/strong&gt;: To reduce the impact from 
network cost, we deployed the testing tools in the same network environment 
with Kylin. Meanwhile, we turned off the query cache for Kylin to make sure 
that each request was executed on the bottom layer.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Testing Tools&lt;/strong&gt;: Aside from our 
traditional testing tool, Apache Jmeter, we also used another open source tool: 
Gatlin (&lt;a 
href=&quot;https://gatling.io/&quot;&gt;https://gatling.io/&lt;/a&gt;) to test 
the same case. We excluded the impact from the tools.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Testing Strategy&lt;/strong&gt;: We simulated user 
requests of different sizes by increasing the number of concurrent threads, 
tracking the average response in 60 seconds, finding the bottleneck for Kylin 
query responses, and observing the maximum response time and success 
rate.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Testing Results&lt;/strong&gt;:&lt;/p&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Thread&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Handled Queries (in 60 
seconds)&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Handled Queries (per 
second)&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Mean Response Time 
(ms)&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;773&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;13&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;77&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;15&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;3245&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;54&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;279&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;25&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;3844&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;64&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;390&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;50&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;4912&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;82&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;612&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;75&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;5405&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;90&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;841&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;100&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;5436&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;91&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;1108&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;150&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;5434&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;91&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;1688&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;p&gt;Resulting in the line chart as follows:&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/cisco_throughput_5x/handled_queries_1.png&quot; 
alt=&quot;&quot; width=&quot;500px&quot; height=&quot;300px&quot; 
/&gt;&lt;/p&gt;
+
+&lt;p&gt;​&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Finding&lt;/strong&gt;: When the number of concurrent 
threads reach 75, executed queries per second reach a peak of 90. The number 
does not become better even as we continue to increase the threads. 90 
concurrent query responses in one second only allows 90/15 = 6 users to view a 
report at the same time. Even when we extend the Kylin query nodes to 3, query 
capability with 18 users per second is far behind our business 
demands.&lt;/p&gt;
+
+&lt;h2 id=&quot;root-cause-analysis&quot;&gt;Root Cause Analysis&lt;/h2&gt;
+
+&lt;p&gt;After reading and analyzing the query engine code of Kylin, we 
learned that Kylin’s query performs parallel filtering and calculation in 
HBase’s region server by launching HBase Coprocessor. Based on this 
information, we checked the resource usage of the HBase cluster. The count of 
RPC Tasks processed on the region server did not increase linearly with the 
number of Kylin query requests when high concurrent queries occured. We 
concluded that there was a thread block on the Kylin side.&lt;/p&gt;
+
+&lt;p&gt;We used Flame graph and JProfile to collect and analyze data from the 
Kylin query node and could not find the root cause. Then we tried to catch a 
thread snapshot of Kylin with Jstack. Analyzing the Jstack log, we discovered 
the root cause of the bottleneck causing this concurrent query issue. The 
example is as follows (Kylin version 2.5.0):&lt;/p&gt;
+
+&lt;p&gt;One thread is locked at sun.misc.URLClassPath.getNextLoader. TID is 
0x000000048007a180:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;Query 
e9c44a2d-6226-ff3b-f984-ce8489107d79-3425&quot; #3425 daemon prio=5 os_prio=0 
tid=0x000000000472b000 nid=0x1433 waiting ]
+   java.lang.Thread.State: BLOCKED (on object monitor)
+    at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:469)
+    - locked &amp;lt;0x000000048007a180&amp;gt; (a sun.misc.URLClassPath)
+    at sun.misc.URLClassPath.findResource(URLClassPath.java:214)
+    at java.net.URLClassLoader$2.run(URLClassLoader.java:569)
+    at java.net.URLClassLoader$2.run(URLClassLoader.java:567)
+    at java.security.AccessController.doPrivileged(Native Method)
+    at java.net.URLClassLoader.findResource(URLClassLoader.java:566)
+    at java.lang.ClassLoader.getResource(ClassLoader.java:1096)
+    at java.lang.ClassLoader.getResource(ClassLoader.java:1091)
+    at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:1666)
+    at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:338)
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;43 threads were waiting to lock &amp;lt;0x000000048007a180&amp;gt;  
at the same time:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;Query 
f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002&quot; #4002 daemon prio=5 os_prio=0 
tid=0x00007f27e71e7800 nid=0x1676 waiting ]
+   java.lang.Thread.State: BLOCKED (on object monitor)
+    at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:469)
+    - waiting to lock &amp;lt;0x000000048007a180&amp;gt; (a 
sun.misc.URLClassPath)
+    at sun.misc.URLClassPath.findResource(URLClassPath.java:214)
+    at java.net.URLClassLoader$2.run(URLClassLoader.java:569)
+    at java.net.URLClassLoader$2.run(URLClassLoader.java:567)
+    at java.security.AccessController.doPrivileged(Native Method)
+    at java.net.URLClassLoader.findResource(URLClassLoader.java:566)
+    at java.lang.ClassLoader.getResource(ClassLoader.java:1096)
+    at java.lang.ClassLoader.getResource(ClassLoader.java:1091)
+    at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:1666)
+    at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:338)
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;We found that the closest code logic to Kylin was:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:338)
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Further analyzing the Kylin source code showed we were getting close 
to the resolution.&lt;/p&gt;
+
+&lt;h2 id=&quot;code-analysis&quot;&gt;Code Analysis&lt;/h2&gt;
+
+&lt;p&gt;When Kylin query engine builds a request to HBase Coprocessor, it 
will export Kylin properties (various properties used in Kylin) as Strings. 
This issue is caused by the relative code logic.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;function private static 
OrderedProperties buildSiteOrderedProps()
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Each thread will getResouce to load 
“kylin-defaults.properties” (the default properties file that users cannot 
modify).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;// 1. load default configurations 
from classpath.
+// we have a kylin-defaults.properties in kylin/core-common/src/main/resources
+URL resource = 
Thread.currentThread().getContextClassLoader().getResource(&quot;kylin-defaults.properties&quot;);
+Preconditions.checkNotNull(resource);
+logger.info(&quot;Loading kylin-defaults.properties from {}&quot;, 
resource.getPath());
+OrderedProperties orderedProperties = new OrderedProperties();
+loadPropertiesFromInputStream(resource.openStream(), orderedProperties);
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Loop 10 times to getResouce for  “kylin-defaults” + (i) + 
“.properties”. Thread LOCKED occurs here.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;for (int i = 0; i &amp;lt; 10; i++) {
+String fileName = &quot;kylin-defaults&quot; +  + &quot;.properties&quot;;
+ URL additionalResource = 
Thread.currentThread().getContextClassLoader().getResource(fileName);
+ if (additionalResource != null) {
+        logger.info(&quot;Loading {} from {} &quot;, fileName, 
additionalResource.getPath());
+ loadPropertiesFromInputStream(additionalResource.openStream(), 
orderedProperties);
+ }
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Those logics were introduced in 2017/6/7, with JIRA ID KYLIN-2659 
&lt;em&gt;Refactor KylinConfig so that all the default configurations are 
hidden in kylin-defaults.properties&lt;/em&gt; reported by Hongbin Ma.&lt;/p&gt;
+
+&lt;h2 id=&quot;issue-fixing&quot;&gt;Issue Fixing&lt;/h2&gt;
+
+&lt;p&gt;For the first part of the logic, because kylin-defaults.properties is 
built in kylin-core-common-xxxx.jar, there’s no need to getResource for it 
every time. We moved this logic to getInstanceFromEnv(). This logic gets called 
only once when service starts.&lt;/p&gt;
+
+&lt;p&gt;We found one regression issue when fixing this bug. One class, 
CubeVisitService, is a Coprocessor. It will use KylinConfig as util class to 
generate KylinConfig object. It’s dangerous to induce any logic to load 
properties. Due to this, there is no Kylin.properties file in 
Coprocessor.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;buildDefaultOrderedProperties();
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;For the second part, this design should be future-proof and allow 
users to define 10 default properties (and override with each other), but after 
a year and a half, this logic seemed to never be used. However, to reduce risk, 
we kept this logic because it only gets called once during service start up 
which resulted in an insignificant waste of additional time.&lt;/p&gt;
+
+&lt;h2 id=&quot;performance-testing-after-bug-fixes&quot;&gt;Performance 
Testing After Bug Fixes&lt;/h2&gt;
+
+&lt;p&gt;Based on the same data volume and testing environment, results were 
as follows:&lt;/p&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Thread&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Handled Queries (in 60 
seconds)&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Handled Queries (per 
second)&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Mean Response Time 
(ms)&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;2451&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;41&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;12&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;15&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;12422&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;207&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;37&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;25&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;15600&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;260&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;56&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;50&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;18481&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;308&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;129&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;75&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;21055&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;351&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;136&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;100&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;24036&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;400&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;251&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;150&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;28014&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;467&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;277&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;p&gt;And the resulting line chart:&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/cisco_throughput_5x/handled_queries_2.png&quot; 
alt=&quot;&quot; width=&quot;500px&quot; height=&quot;300px&quot; 
/&gt;&lt;/p&gt;
+
+&lt;p&gt;When the concurrent threads reached 150, Kylin processed 467 requests 
per second. The concurrent query capability increased by five times with linear 
growth . It could be concluded then that the bottleneck was eliminated . We 
didn’t increase the concurrent threads due to the Kylin query engine’s 
settings for cluster load balancing which meant that increasing the concurrent 
connections on a single node increased the workload on the Tomcat server (Max 
connection is 150 in Kylin by default). The thread blocking issue disappeared 
after re-collecting and analyzing the Jstack log.&lt;/p&gt;
+
+&lt;p&gt;After the fix, each Kylin node could now handle requests for 467/15 = 
31 users, which meets our business requirement. Additionally, Kylin’s 
concurrent query capability can be further improved by several times once we 
enable query cache, so it is more than sufficient to fulfill our 
needs.&lt;/p&gt;
+
+&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
+
+&lt;p&gt;Apache Kylin lets you query massive datasets at sub-second latency, 
thanks to the pre-calculation design of cubes, the optimization of Apache 
Calcite operator in queries, and also the introduction of “Prepared Statement 
Cache” to reduce the cost of Calcite SQL parses. Query performance 
optimization is not easy. We need to pay more attention to impacts on the Kylin 
query engine when new features are introduced or bugs are fixed, since even a 
minor code change could spell disaster. Issues like these in high concurrency 
scenarios can often be hard to reproduce and analyze.&lt;/p&gt;
+
+&lt;p&gt;Lastly, query performance testing should not be limited to a single 
or small set of queries. High concurrecy performance testing should take place 
considering actual business requirements. For enterprise reporting systems, 3 
seconds is the user tolerance limit for new page loading, which includes page 
rendering and network consumption. Ultimately, the backend data service should 
provide a response within 1 second. This is indeed a big challenge in a 
business scenario with big data sets. Fortunately, Kylin easily meets this 
requirement.&lt;/p&gt;
+
+&lt;p&gt;This issue has already been submitted on JIRA as &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-3672&quot;&gt;KYLIN-3672&lt;/a&gt;,
 and released in Kylin v2.5.2. Thanks to Shaofeng Shi of Kyligence Inc. for 
help.&lt;/p&gt;
+
+&lt;p&gt;【1】&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-3672&quot;&gt;https://issues.apache.org/jira/browse/KYLIN-3672&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;em&gt;Author Zongwie Li as a Cisco engineer and a team member in 
the company’s Big Data architecture team, currently responsible for OLAP 
platform construction and customer business reporting 
systems.&lt;/em&gt;&lt;/p&gt;
+
+</description>
+        <pubDate>Thu, 17 Jan 2019 09:30:00 -0800</pubDate>
+        
<link>http://kylin.apache.org/blog/2019/01/17/cisco-throughput-5x/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2019/01/17/cisco-throughput-5x/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Introduce data source SDK</title>
         <description>&lt;h2 id=&quot;data-source-sdk&quot;&gt;Data source 
SDK&lt;/h2&gt;
 
@@ -1165,47 +1437,6 @@ kylin.engine.spark.rdd-partition-cut-mb=
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>A new measure for Percentile precalculation</title>
-        <description>&lt;h2 
id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
-
-&lt;p&gt;Since Apache Kylin 2.0, there’s a new measure for percentile 
precalculation, which aims at (sub-)second latency for 
&lt;strong&gt;approximate&lt;/strong&gt; percentile analytics SQL queries. The 
implementation is based on &lt;a 
href=&quot;https://github.com/tdunning/t-digest&quot;&gt;t-digest&lt;/a&gt; 
library under Apachee 2.0 license, which provides a high-effecient data 
structure to save aggregation counters and algorithm to calculate approximate 
result of percentile.&lt;/p&gt;
-
-&lt;h3 id=&quot;percentile&quot;&gt;Percentile&lt;/h3&gt;
-&lt;p&gt;&lt;em&gt;From &lt;a 
href=&quot;https://en.wikipedia.org/wiki/Percentile&quot;&gt;wikipedia&lt;/a&gt;&lt;/em&gt;:
 A &lt;strong&gt;percentile&lt;/strong&gt; (or a 
&lt;strong&gt;centile&lt;/strong&gt;) is a measure used in statistics 
indicating the value below which a given percentage of observations in a 
group of observations fall. For example, the 20th percentile is the value (or 
score) below which 20% of the observations may be found.&lt;/p&gt;
-
-&lt;p&gt;In Apache Kylin, we support the similar SQL sytanx like Apache Hive, 
with a aggregation function called &lt;strong&gt;percentile(&amp;lt;Number 
Column&amp;gt;, &amp;lt;Double&amp;gt;)&lt;/strong&gt;:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;seller_id&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;percentile&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;test_kylin_fact&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;seller_id&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;h3 id=&quot;how-to-use&quot;&gt;How to use&lt;/h3&gt;
-&lt;p&gt;If you know little about &lt;em&gt;Cubes&lt;/em&gt;, please go to 
&lt;a 
href=&quot;http://kylin.apache.org/docs20/tutorial/kylin_sample.html&quot;&gt;QuickStart&lt;/a&gt;
 first to learn basic knowledge.&lt;/p&gt;
-
-&lt;p&gt;Firstly, you need to add this column as measure in data 
model.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/percentile_1.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Secondly, create a cube and add a PERCENTILE measure.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/percentile_2.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Finally, build the cube and try some query.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/percentile_3.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-</description>
-        <pubDate>Sat, 01 Apr 2017 15:22:22 -0700</pubDate>
-        
<link>http://kylin.apache.org/blog/2017/04/01/percentile-measure/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2017/04/01/percentile-measure/</guid>
-        
-        
-        <category>blog</category>
         
       </item>
     

Added: kylin/site/images/blog/cisco_throughput_5x/handled_queries_1.png
URL: 
http://svn.apache.org/viewvc/kylin/site/images/blog/cisco_throughput_5x/handled_queries_1.png?rev=1851604&view=auto
==============================================================================
Binary file - no diff available.

Propchange: kylin/site/images/blog/cisco_throughput_5x/handled_queries_1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: kylin/site/images/blog/cisco_throughput_5x/handled_queries_2.png
URL: 
http://svn.apache.org/viewvc/kylin/site/images/blog/cisco_throughput_5x/handled_queries_2.png?rev=1851604&view=auto
==============================================================================
Binary file - no diff available.

Propchange: kylin/site/images/blog/cisco_throughput_5x/handled_queries_2.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream


Reply via email to