Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1851725&r1=1851724&r2=1851725&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Mon Jan 21 04:20:52 2019 @@ -19,11 +19,168 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Thu, 17 Jan 2019 23:11:23 -0800</pubDate> - <lastBuildDate>Thu, 17 Jan 2019 23:11:23 -0800</lastBuildDate> + <pubDate>Sun, 20 Jan 2019 20:11:59 -0800</pubDate> + <lastBuildDate>Sun, 20 Jan 2019 20:11:59 -0800</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> + <title>Apache Kylin v2.6.0 Release Announcement</title> + <description><p>The Apache Kylin community is pleased to announce the release of Apache Kylin v2.6.0.</p> + +<p>Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Big Data supporting extremely large datasets.</p> + +<p>This is a major release after 2.5.0, including many enhancements. All of the changes can be found in the <a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>. Here just highlight the major ones:</p> + +<h3 id="sdk-for-jdbc-sources">SDK for JDBC sources</h3> +<p>Apache Kylin has already supported several data sources like Amazon Redshift, SQL Server through JDBC. <br /> +To help developers handle SQL dialect differences and easily implement a new data source through JDBC, Kylin provides a new data source SDK with APIs for:<br /> +* Synchronize metadata and data from JDBC source<br /> +* Build cube from JDBC source<br /> +* Query pushdown to JDBC source engine when cube is unmatched</p> + +<p>Check KYLIN-3552 for more.</p> + +<h3 id="memcached-as-distributed-cache">Memcached as distributed cache</h3> +<p>In the past, query caches are not efficiently used in Kylin due to two aspects: aggressive cache expiration strategy and local cache. <br /> +Because of the aggressive cache expiration strategy, useful caches are often cleaned up unnecessarily. <br /> +Because query caches are stored in local servers, they cannot be shared between servers. <br /> +And because of the size limitation of local cache, not all useful query results can be cached.</p> + +<p>To deal with these shortcomings, we change the query cache expiration strategy by signature checking and introduce the memcached as Kylinâs distributed cache so that Kylin servers are able to share cache between servers. <br /> +And itâs easy to add memcached servers to scale out distributed cache. With enough memcached servers, we can cached things as much as possible. <br /> +Then we also introduce segment level query cache which can not only speed up query but also reduce the rpcs to HBase. <br /> +The related tasks are KYLIN-2895, KYLIN-2894, KYLIN-2896, KYLIN-2897, KYLIN-2898, KYLIN-2899.</p> + +<h3 id="forkjoinpool-for-fast-cubing">ForkJoinPool for fast cubing</h3> +<p>In the past, fast cubing uses split threads, task threads and main thread to do the cube building, there is complex join and error handling logic.</p> + +<p>The new implement leverages the ForkJoinPool from JDK, the event split logic is handled in<br /> +main thread. Cuboid task and sub-tasks are handled in fork join pool, cube results are collected<br /> +async and can be write to output earlier. Check KYLIN-2932 for more.</p> + +<h3 id="improve-hllcounter-performance">Improve HLLCounter performance</h3> +<p>In the past, the way to create HLLCounter and to compute harmonic mean are not efficient.</p> + +<p>The new implement improve the HLLCounter creation by copy register from another HLLCounter instead of merge. To compute harmonic mean in the HLLCSnapshot, it does the enhancement by <br /> +* using table to cache all 1/2^r without computing on the fly<br /> +* remove floating addition by using integer addition in the bigger loop<br /> +* remove branch, e.g. neednât checking whether registers[i] is zero or not, although this is minor improvement.</p> + +<p>Check KYLIN-3656 for more.</p> + +<h3 id="improve-cuboid-recommendation-algorithm">Improve Cuboid Recommendation Algorithm</h3> +<p>In the past, to add cuboids which are not prebuilt, the cube planner turns to mandatory cuboids which are selected if its rollup row count is above some threshold. <br /> +There are two shortcomings:<br /> +* The way to estimate the rollup row count is not good<br /> +* Itâs hard to determine the threshold of rollup row count for recommending mandatory cuboids</p> + +<p>The new implement improves the way to estimate the row count of un-prebuilt cuboids by rollup ratio rather than exact rollup row count. <br /> +With better estimated row counts for un-prebuilt cuboids, the cost-based cube planner algorithm will decide which cuboid to be built or not and the threshold for previous mandatory cuboids is not needed. <br /> +By this improvement, we donât need the threshold for mandatory cuboids recommendation, and mandatory cuboids can only be manually set and will not be recommended. Check KYLIN-3540 for more.</p> + +<p><strong>Download</strong></p> + +<p>To download Apache Kylin v2.6.0 source code or binary package, visit the <a href="http://kylin.apache.org/download">download</a> page.</p> + +<p><strong>Upgrade</strong></p> + +<p>Follow the <a href="/docs/howto/howto_upgrade.html">upgrade guide</a>.</p> + +<p><strong>Feedback</strong></p> + +<p>If you face issue or question, please send mail to Apache Kylin dev or user mailing list: [email protected] , [email protected]; Before sending, please make sure you have subscribed the mailing list by dropping an email to [email protected] or [email protected].</p> + +<p><em>Great thanks to everyone who contributed!</em></p> +</description> + <pubDate>Fri, 18 Jan 2019 12:00:00 -0800</pubDate> + <link>http://kylin.apache.org/blog/2019/01/18/release-v2.6.0/</link> + <guid isPermaLink="true">http://kylin.apache.org/blog/2019/01/18/release-v2.6.0/</guid> + + + <category>blog</category> + + </item> + + <item> + <title>Apache Kylin v2.6.0 æ£å¼åå¸</title> + <description><p>è¿æ¥Apache Kylin 社åºå¾é«å ´å°å®£å¸ï¼Apache Kylin 2.6.0 æ£å¼åå¸ã</p> + +<p>Apache Kylin æ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æ¨å¨ä¸ºæå¤§æ°æ®éæä¾ SQL æ¥å£åå¤ç»´åæï¼OLAPï¼çè½åã</p> + +<p>è¿æ¯ç»§2.5.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å ¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p> + +<h3 id="jdbcsdk">é对以JDBCä¸ºæ°æ®æºçSDK</h3> +<p>Kylinç®åå·²ç»æ¯æéè¿JDBCè¿æ¥å æ¬Amazon Redshift, SQL Serverå¨å çå¤ç§æ°æ®æºã<br /> +为äºä¾¿äºå¼åè æ´ä¾¿å©å°å¤çåç§ SQL æ¹è¨ï¼dialectï¼ çä¸å以æ´å ç®åå°å¼åæ°ç JDBC æ°æ®æºï¼Kylin æä¾äºç¸åºç SDK åç»ä¸ç API å ¥å£ï¼<br /> +* åæ¥å æ°æ®åæ°æ®<br /> +* æå»º cube<br /> +* 彿¾ä¸å°ç¸åºçcubeæ¥è§£çæ¥è¯¢æ¶ï¼ä¸æ¨æ¥è¯¢å°æ°æ®æº</p> + +<p>æ´å¤å 容åè§ KYLIN-3552ã</p> + +<h3 id="memcached--kylin-">ä½¿ç¨ Memcached ä½ Kylin çåå¸å¼ç¼å</h3> +<p>å¨è¿å»ï¼Kylin 对æ¥è¯¢ç»æçç¼å䏿¯åå髿ï¼ä¸»è¦æä»¥ä¸ä¸¤ä¸ªæ¹é¢çåå ï¼<br /> +ä¸ä¸ªæ¯å½ Kylin ç metadata åçååæ¶ï¼ä¼ä¸»å¨ç²ç®å°å»æ¸ é¤å¤§éç¼åï¼ä½¿å¾ç¼åä¼è¢«é¢ç¹å·æ°è导è´å©ç¨çéä½ã<br /> +å¦ä¸ç¹æ¯ç±äºåªä½¿ç¨æ¬å°ç¼åèå¯¼è´ Kylin server ä¹é´ä¸è½å ±äº«å½¼æ¤çç¼åï¼è¿æ ·æ¥è¯¢çç¼åå½ä¸çå°±ä¼éä½ã<br /> +æ¬å°ç¼åçä¸ä¸ªç¼ºç¹æ¯å¤§å°åå°éå¶ï¼ä¸è½ååå¸å¼ç¼å飿 ·æ°´å¹³æ©å±ãè¿æ ·å¯¼è´è½ç¼åçæ¥è¯¢ç»æéåå°äºéå¶ã</p> + +<p>é对è¿äºç¼ºé·ï¼æä»¬æ¹åäºç¼å失æçæºå¶ï¼ä¸å主å¨å»æ¸ çç¼åï¼èæ¯éåå¦ä¸çæ¹æ¡ï¼<br /> +1. å¨å°æ¥è¯¢ç»ææ¾å ¥ç¼åä¹åï¼æ ¹æ®å½åçå æ°æ®ä¿¡æ¯è®¡ç®ä¸ä¸ªæ°åç¾åï¼å¹¶ä¸æ¥è¯¢ç»æä¸åæ¾å ¥ç¼åä¸<br /> +2. ä»ç¼åä¸è·åæ¥è¯¢ç»æä¹åï¼æ ¹æ®å½åçå æ°æ®ä¿¡æ¯è®¡ç®ä¸ä¸ªæ°åç¾åï¼å¯¹æ¯ä¸¤è çæ°åç¾åæ¯å¦ä¸è´ã妿ä¸è´ï¼é£ä¹ç¼åææï¼åä¹ï¼è¯¥ç¼å失æå¹¶å é¤</p> + +<p>æä»¬è¿å¼å ¥äº Memcached ä½ä¸º Kylin çåå¸å¼ç¼åãè¿æ · Kylin server ä¹é´å¯ä»¥å ±äº«æ¥è¯¢ç»æçç¼åï¼èä¸ç±äº Memcached server ä¹é´çç¬ç«æ§ï¼é常æäºæ°´å¹³æå±ï¼æ´å æå©äºç¼åæ´å¤çæ°æ®ã<br /> +ç¸å ³å¼å任塿¯ KYLIN-2895, KYLIN-2894, KYLIN-2896, KYLIN-2897, KYLIN-2898, KYLIN-2899ã</p> + +<h3 id="forkjoinpool--fast-cubing-">ForkJoinPool ç®å fast cubing ççº¿ç¨æ¨¡å</h3> +<p>å¨è¿å»è¿è¡ fast cubing æ¶ï¼Kylin 使ç¨èªå·±å®ä¹çä¸ç³»å线ç¨ï¼å¦ split 线ç¨ï¼task 线ç¨ï¼main 线ç¨ççè¿è¡å¹¶åç cube æå»ºã<br /> +å¨è¿ä¸ªçº¿ç¨æ¨¡åä¸ï¼çº¿ç¨ä¹é´çå ³ç³»ååç夿ï¼èä¸å¯¹å¼å¸¸å¤çä¹åå容æåºéã</p> + +<p>ç°å¨æä»¬å¼å ¥äº ForkJoinPoolï¼å¨ä¸»çº¿ç¨ä¸å¤ç split é»è¾ï¼æå»º cuboid çä»»å¡ä»¥ååä»»å¡é½å¨ fork join pool䏿§è¡ï¼cuboid æå»ºçç»æå¯ä»¥è¢«å¼æ¥çæ¶éå¹¶ä¸å¯ä»¥æ´æ©å°è¾åºç»ä¸æ¸¸ç merge æä½ãæ´å¤å 容åè§ KYLIN-2932ã</p> + +<h3 id="hllcounter-">æ¹è¿ HLLCounter çæ§è½</h3> +<p>å¯¹äº HLLCounterï¼ æä»¬ä»ä¸¤æ¹é¢è¿è¡äºæ¹è¿ï¼æå»º HLLCounter å计ç®è°åå¹³åçæ¹å¼ã<br /> +1. å ³äº HLLCounter çæå»ºï¼æä»¬ä¸å使ç¨mergeçæ¹å¼ï¼èæ¯ç´æ¥copyå«çHLLCounteréé¢çregisters<br /> +2. å ³äºè®¡ç® HLLCSnapshot éé¢çè°åå¹³åï¼åäºä»¥ä¸ä¸ä¸ªæ¹é¢çæ¹è¿ï¼<br /> +* ç¼åææç1/2^r<br /> +* ä½¿ç¨æ´åç¸å ä»£æ¿æµ®ç¹åç¸å <br /> +* å 餿¡ä»¶åæ¯ï¼ä¾å¦æ éæ£æ¥ registers[i] æ¯ä¸æ¯ä¸º0</p> + +<p>æ´å¤å 容åè§ KYLIN-3656ã</p> + +<h3 id="cube-planner-">æ¹è¿ Cube Planner ç®æ³</h3> +<p>å¨è¿å»ï¼cube planner ç phase two å¢å æªè¢«é¢è®¡ç®ç cuboid çæ¹å¼åªè½éè¿ mandatory cuboid çæ¹å¼ãèä¸ä¸ª cuboid æ¯å¦ä¸º mandatoryï¼åæä¸¤ç§æ¹å¼ï¼<br /> +æå¨è®¾ç½®ï¼æ¥è¯¢æ¶ rollup çè¡æ°è¶³å¤å¤§ãè¿ééè¿å¤ææ¥è¯¢æ¶ rollup çè¡æ°æ¯å¦è¶³å¤å¤§æ¥å¤ææ¯å¦ä¸º mandatory cuboid çæ¹å¼æä¸¤å¤§ç¼ºé·ï¼<br /> +* 䏿¯ä¼°ç® rollup çè¡æ°çç®æ³ä¸æ¯å¾å¥½<br /> +* äºæ¯å¾é¾è®¾ç«ä¸ä¸ªéæçé弿¥åå¤å®</p> + +<p>ç°å¨æä»¬ä¸åä» rollup è¡æ°çè§åº¦çé®é¢äºãä¸å齿¯ä» cuboid è¡æ°çè§åº¦çé®é¢ï¼è¿æ ·å°±å cost based ç cube planner ç®æ³åäºç»ä¸ã<br /> +ä¸ºæ¤æä»¬éè¿ä½¿ç¨ rollup æ¯çæ¥æ¹è¿äºæªè¢«é¢å æå»ºç cuboid çè¡æ°çä¼°ç®ï¼ç¶å让 cost based ç cube planner ç®æ³æ¥å¤å®åªäºæªè¢«æå»ºç cuboid 该被æå»ºï¼åªäºè¯¥è¢«éå¼ã<br /> +éè¿è¿æ ·çæ¹è¿ï¼æ ééè¿è®¾å®éæçé弿¥æ¨è mandatory cuboid äºï¼è mandatory cuboid åªè½è¢«æå¨è®¾ç½®ï¼ä¸è½è¢«æ¨èäºãæ´å¤å 容åè§ KYLIN-3540ã</p> + +<p><strong>ä¸è½½</strong></p> + +<p>è¦ä¸è½½Apache Kylin v2.6.0æºä»£ç æäºè¿å¶å ï¼è¯·è®¿é®<a href="http://kylin.apache.org/download">ä¸è½½é¡µé¢</a> .</p> + +<p><strong>å级</strong></p> + +<p>åè<a href="/docs/howto/howto_upgrade.html">å级æå</a>.</p> + +<p><strong>åé¦</strong></p> + +<p>妿æ¨éå°é®é¢æçé®ï¼è¯·åéé®ä»¶è³ Apache Kylin dev æ user é®ä»¶å表ï¼[email protected]ï¼[email protected]; å¨åéä¹åï¼è¯·ç¡®ä¿æ¨å·²éè¿åéçµåé®ä»¶è³ [email protected] æ [email protected]订é äºé®ä»¶å表ã</p> + +<p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> +</description> + <pubDate>Fri, 18 Jan 2019 12:00:00 -0800</pubDate> + <link>http://kylin.apache.org/cn/blog/2019/01/18/release-v2.6.0/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2019/01/18/release-v2.6.0/</guid> + + + <category>blog</category> + + </item> + + <item> <title>How Cisco's Big Data Team Improved the High Concurrent Throughput of Apache Kylin by 5x</title> <description><h2 id="background">Background</h2> @@ -545,70 +702,6 @@ Graphic 10 Process of Querying Cube</ </item> <item> - <title>Apache Kylin v2.5.0 æ£å¼åå¸</title> - <description><p>è¿æ¥Apache Kylin 社åºå¾é«å ´å°å®£å¸ï¼Apache Kylin 2.5.0 æ£å¼åå¸ã</p> - -<p>Apache Kylin æ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æ¨å¨ä¸ºæå¤§æ°æ®éæä¾ SQL æ¥å£åå¤ç»´åæï¼OLAPï¼çè½åã</p> - -<p>è¿æ¯ç»§2.4.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å ¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p> - -<h3 id="all-in-spark--cubing-">All-in-Spark ç Cubing 弿</h3> -<p>Kylin ç Spark 弿å°ä½¿ç¨ Spark è¿è¡ cube 计ç®ä¸çææåå¸å¼ä½ä¸ï¼å æ¬è·åå个维度çä¸åå¼ï¼å° cuboid æä»¶è½¬æ¢ä¸º HBase HFileï¼åå¹¶ segmentï¼åå¹¶è¯å ¸çãé»è®¤ç Spark é ç½®ä¹ç»è¿ä¼åï¼ä½¿å¾ç¨æ·å¯ä»¥è·å¾å¼ç®±å³ç¨çä½éªãç¸å ³å¼å任塿¯ KYLIN-3427, KYLIN-3441, KYLIN-3442.</p> - -<p>Spark ä»»å¡ç®¡çä¹æææ¹è¿ï¼ä¸æ¦ Spark ä»»å¡å¼å§è¿è¡ï¼æ¨å°±å¯ä»¥å¨Webæ§å¶å°ä¸è·å¾ä½ä¸é¾æ¥ï¼å¦ææ¨ä¸¢å¼è¯¥ä½ä¸ï¼Kylin å°ç«å»ç»æ¢ Spark ä½ä¸ä»¥åæ¶éæ¾èµæºï¼å¦æéæ°å¯å¨ Kylinï¼å®å¯ä»¥ä»ä¸ä¸ä¸ªä½ä¸æ¢å¤ï¼è䏿¯éæ°æäº¤æ°ä½ä¸.</p> - -<h3 id="mysql--kylin-">MySQL å Kylin å æ°æ®çåå¨</h3> -<p>å¨è¿å»ï¼HBase æ¯ Kylin å æ°æ®åå¨çå¯ä¸éæ©ã å¨æäºæ åµä¸ HBaseä¸éç¨ï¼ä¾å¦ä½¿ç¨å¤ä¸ª HBase é群æ¥ä¸º Kylin æä¾è·¨åºåçé«å¯ç¨ï¼è¿éå¤å¶ç HBase é群æ¯åªè¯»çï¼æä»¥ä¸è½åå æ°æ®åå¨ãç°å¨æä»¬å¼å ¥äº MySQL Metastore 以满足è¿ç§éæ±ãæ¤åè½ç°å¨å¤äºæµè¯é¶æ®µãæ´å¤å 容åè§ KYLIN-3488ã</p> - -<h3 id="hybrid-model-">Hybrid model å¾å½¢çé¢</h3> -<p>Hybrid æ¯ä¸ç§ç¨äºç»è£ å¤ä¸ª cube çé«çº§æ¨¡åã å®å¯ç¨äºæ»¡è¶³ cube ç schema è¦åçæ¹åçæ åµãè¿ä¸ªåè½è¿å»æ²¡æå¾å½¢çé¢ï¼å æ¤åªæä¸å°é¨åç¨æ·ç¥éå®ãç°å¨æä»¬å¨ Web çé¢ä¸å¼å¯äºå®ï¼ä»¥ä¾¿æ´å¤ç¨æ·å¯ä»¥å°è¯ã</p> - -<h3 id="cube-planner">é»è®¤å¼å¯ Cube planner</h3> -<p>Cube planner å¯ä»¥æå¤§å°ä¼å cube ç»æï¼åå°æå»ºç cuboid æ°éï¼ä»èèç计ç®/åå¨èµæºå¹¶æé«æ¥è¯¢æ§è½ã宿¯å¨v2.3ä¸å¼å ¥çï¼ä½é»è®¤æ åµä¸æ²¡æå¼å¯ã为äºè®©æ´å¤ç¨æ·çå°å¹¶å°è¯å®ï¼æä»¬é»è®¤å¨v2.5ä¸å¯ç¨å®ã ç®æ³å°å¨ç¬¬ä¸æ¬¡æå»º segment çæ¶åï¼æ ¹æ®æ°æ®ç»è®¡èªå¨ä¼å cuboid éå.</p> - -<h3 id="segment-">æ¹è¿ç Segment åªæ</h3> -<p>Segmentï¼ååºï¼ä¿®åªå¯ä»¥ææå°åå°ç£çåç½ç»I / Oï¼å æ¤å¤§å¤§æé«äºæ¥è¯¢æ§è½ã è¿å»ï¼Kylin åªæååºå (partition date column) çå¼è¿è¡ segment çä¿®åªã 妿æ¥è¯¢ä¸æ²¡æå°ååºåä½ä¸ºè¿æ»¤æ¡ä»¶ï¼é£ä¹ä¿®åªå°ä¸èµ·ä½ç¨ï¼ä¼æ«æææsegmentã.<br /> -ç°å¨ä»v2.5å¼å§ï¼Kylin å°å¨ segment 级å«è®°å½æ¯ä¸ªç»´åº¦çæå°/æå¤§å¼ã 卿«æ segment ä¹åï¼ä¼å°æ¥è¯¢çæ¡ä»¶ä¸æå°/æå¤§ç´¢å¼è¿è¡æ¯è¾ã 妿ä¸å¹é ï¼å°è·³è¿è¯¥ segmentã æ£æ¥KYLIN-3370äºè§£æ´å¤ä¿¡æ¯ã</p> - -<h3 id="yarn-">å¨ YARN ä¸åå¹¶åå ¸</h3> -<p>å½ segment åå¹¶æ¶ï¼å®ä»¬çè¯å ¸ä¹éè¦åå¹¶ãå¨è¿å»ï¼åå ¸åå¹¶åçå¨ Kylin ç JVM ä¸ï¼è¿éè¦ä½¿ç¨å¤§éçæ¬å°å åå CPU èµæºã å¨æç«¯æ åµä¸ï¼å¦ææå 个并åä½ä¸ï¼ï¼å¯è½ä¼å¯¼è´ Kylin è¿ç¨å´©æºã å æ¤ï¼ä¸äºç¨æ·ä¸å¾ä¸ä¸º Kylin ä»»å¡èç¹åé æ´å¤å åï¼æè¿è¡å¤ä¸ªä»»å¡èç¹ä»¥å¹³è¡¡å·¥ä½è´è½½ã<br /> -ç°å¨ä»v2.5å¼å§ï¼Kylin å°æè¿é¡¹ä»»å¡æäº¤ç» Hadoop MapReduce å Sparkï¼è¿æ ·å°±å¯ä»¥è§£å³è¿ä¸ªç¶é¢é®é¢ã æ¥çKYLIN-3471äºè§£æ´å¤ä¿¡æ¯.</p> - -<h3 id="cube-">æ¹è¿ä½¿ç¨å ¨å±åå ¸ç cube æå»ºæ§è½</h3> -<p>å ¨å±åå ¸ (Global Dictionary) æ¯ bitmap 精确å»é计æ°çå¿ è¦æ¡ä»¶ã妿å»éåå ·æé常é«çåºæ°ï¼å GD å¯è½é常大ãå¨ cube æå»ºé¶æ®µï¼Kylin éè¦éè¿ GD å°éæ´æ°å¼è½¬æ¢ä¸ºæ´æ°ã尽管 GD 已被åæå¤ä¸ªåçï¼å¯ä»¥åå¼å è½½å°å åï¼ä½æ¯ç±äºå»éåç弿¯ä¹±åºçãKylin éè¦åå¤è½½å ¥åè½½åº(swap in/out)åçï¼è¿ä¼å¯¼è´æå»ºä»»å¡éå¸¸ç¼æ ¢ã<br /> -该å¢å¼ºåè½å¼å ¥äºä¸ä¸ªæ°æ¥éª¤ï¼ä¸ºæ¯ä¸ªæ°æ®åä»å ¨å±åå ¸ä¸æå»ºä¸ä¸ªç¼©å°çåå ¸ã éåæ¯ä¸ªä»»å¡åªéè¦å 载缩å°çåå ¸ï¼ä»èé¿å é¢ç¹çè½½å ¥åè½½åºãæ§è½å¯ä»¥æ¯ä»¥åå¿«3åãæ¥ç KYLIN-3491 äºè§£æ´å¤ä¿¡æ¯.</p> - -<h3 id="topn-count-distinct--cube-">æ¹è¿å« TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3> -<p>Cube ç大å°å¨æå»ºæ¶æ¯é¢å 估计çï¼å¹¶è¢«åç»å 个æ¥éª¤ä½¿ç¨ï¼ä¾å¦å³å® MR / Spark ä½ä¸çååºæ°ï¼è®¡ç® HBase region åå²çãå®çåç¡®ä¸å¦ä¼å¯¹æå»ºæ§è½äº§çå¾å¤§å½±åã å½åå¨ COUNT DISTINCTï¼TOPN çåº¦éæ¶åï¼å 为å®ä»¬ç大尿¯çµæ´»çï¼å æ¤ä¼°è®¡å¼å¯è½è·çå®å¼æå¾å¤§åå·®ã å¨è¿å»ï¼ç¨æ·éè¦è°æ´è¥å¹²ä¸ªåæ°ä»¥ä½¿å°ºå¯¸ä¼°è®¡æ´æ¥è¿å®é 尺寸ï¼è¿å¯¹æ®éç¨æ·æç¹å°é¾ã<br /> -ç°å¨ï¼Kylin å°æ ¹æ®æ¶éçç»è®¡ä¿¡æ¯èªå¨è°æ´å¤§å°ä¼°è®¡ãè¿å¯ä»¥ä½¿ä¼°è®¡å¼ä¸å®é 大尿´æ¥è¿ãæ¥ç KYLIN-3453 äºè§£æ´å¤ä¿¡æ¯ã</p> - -<h3 id="hadoop-30hbase-20">æ¯æHadoop 3.0/HBase 2.0</h3> -<p>Hadoop 3å HBase 2å¼å§è¢«è®¸å¤ç¨æ·éç¨ãç°å¨ Kylin æä¾ä½¿ç¨æ°ç Hadoop å HBase API ç¼è¯çæ°äºè¿å¶å ãæä»¬å·²ç»å¨ Hortonworks HDP 3.0 å Cloudera CDH 6.0 ä¸è¿è¡äºæµè¯</p> - -<p><strong>ä¸è½½</strong></p> - -<p>è¦ä¸è½½Apache Kylin v2.5.0æºä»£ç æäºè¿å¶å ï¼è¯·è®¿é®<a href="http://kylin.apache.org/download">ä¸è½½é¡µé¢</a> .</p> - -<p><strong>å级</strong></p> - -<p>åè<a href="/docs/howto/howto_upgrade.html">å级æå</a>.</p> - -<p><strong>åé¦</strong></p> - -<p>妿æ¨éå°é®é¢æçé®ï¼è¯·åéé®ä»¶è³ Apache Kylin dev æ user é®ä»¶å表ï¼[email protected]ï¼[email protected]; å¨åéä¹åï¼è¯·ç¡®ä¿æ¨å·²éè¿åéçµåé®ä»¶è³ [email protected] æ [email protected]订é äºé®ä»¶å表ã</p> - -<p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> -</description> - <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> - <link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link> - <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid> - - - <category>blog</category> - - </item> - - <item> <title>Apache Kylin v2.5.0 Release Announcement</title> <description><p>The Apache Kylin community is pleased to announce the release of Apache Kylin v2.5.0.</p> @@ -681,6 +774,70 @@ Graphic 10 Process of Querying Cube</ </item> <item> + <title>Apache Kylin v2.5.0 æ£å¼åå¸</title> + <description><p>è¿æ¥Apache Kylin 社åºå¾é«å ´å°å®£å¸ï¼Apache Kylin 2.5.0 æ£å¼åå¸ã</p> + +<p>Apache Kylin æ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æ¨å¨ä¸ºæå¤§æ°æ®éæä¾ SQL æ¥å£åå¤ç»´åæï¼OLAPï¼çè½åã</p> + +<p>è¿æ¯ç»§2.4.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å ¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p> + +<h3 id="all-in-spark--cubing-">All-in-Spark ç Cubing 弿</h3> +<p>Kylin ç Spark 弿å°ä½¿ç¨ Spark è¿è¡ cube 计ç®ä¸çææåå¸å¼ä½ä¸ï¼å æ¬è·åå个维度çä¸åå¼ï¼å° cuboid æä»¶è½¬æ¢ä¸º HBase HFileï¼åå¹¶ segmentï¼åå¹¶è¯å ¸çãé»è®¤ç Spark é ç½®ä¹ç»è¿ä¼åï¼ä½¿å¾ç¨æ·å¯ä»¥è·å¾å¼ç®±å³ç¨çä½éªãç¸å ³å¼å任塿¯ KYLIN-3427, KYLIN-3441, KYLIN-3442.</p> + +<p>Spark ä»»å¡ç®¡çä¹æææ¹è¿ï¼ä¸æ¦ Spark ä»»å¡å¼å§è¿è¡ï¼æ¨å°±å¯ä»¥å¨Webæ§å¶å°ä¸è·å¾ä½ä¸é¾æ¥ï¼å¦ææ¨ä¸¢å¼è¯¥ä½ä¸ï¼Kylin å°ç«å»ç»æ¢ Spark ä½ä¸ä»¥åæ¶éæ¾èµæºï¼å¦æéæ°å¯å¨ Kylinï¼å®å¯ä»¥ä»ä¸ä¸ä¸ªä½ä¸æ¢å¤ï¼è䏿¯éæ°æäº¤æ°ä½ä¸.</p> + +<h3 id="mysql--kylin-">MySQL å Kylin å æ°æ®çåå¨</h3> +<p>å¨è¿å»ï¼HBase æ¯ Kylin å æ°æ®åå¨çå¯ä¸éæ©ã å¨æäºæ åµä¸ HBaseä¸éç¨ï¼ä¾å¦ä½¿ç¨å¤ä¸ª HBase é群æ¥ä¸º Kylin æä¾è·¨åºåçé«å¯ç¨ï¼è¿éå¤å¶ç HBase é群æ¯åªè¯»çï¼æä»¥ä¸è½åå æ°æ®åå¨ãç°å¨æä»¬å¼å ¥äº MySQL Metastore 以满足è¿ç§éæ±ãæ¤åè½ç°å¨å¤äºæµè¯é¶æ®µãæ´å¤å 容åè§ KYLIN-3488ã</p> + +<h3 id="hybrid-model-">Hybrid model å¾å½¢çé¢</h3> +<p>Hybrid æ¯ä¸ç§ç¨äºç»è£ å¤ä¸ª cube çé«çº§æ¨¡åã å®å¯ç¨äºæ»¡è¶³ cube ç schema è¦åçæ¹åçæ åµãè¿ä¸ªåè½è¿å»æ²¡æå¾å½¢çé¢ï¼å æ¤åªæä¸å°é¨åç¨æ·ç¥éå®ãç°å¨æä»¬å¨ Web çé¢ä¸å¼å¯äºå®ï¼ä»¥ä¾¿æ´å¤ç¨æ·å¯ä»¥å°è¯ã</p> + +<h3 id="cube-planner">é»è®¤å¼å¯ Cube planner</h3> +<p>Cube planner å¯ä»¥æå¤§å°ä¼å cube ç»æï¼åå°æå»ºç cuboid æ°éï¼ä»èèç计ç®/åå¨èµæºå¹¶æé«æ¥è¯¢æ§è½ã宿¯å¨v2.3ä¸å¼å ¥çï¼ä½é»è®¤æ åµä¸æ²¡æå¼å¯ã为äºè®©æ´å¤ç¨æ·çå°å¹¶å°è¯å®ï¼æä»¬é»è®¤å¨v2.5ä¸å¯ç¨å®ã ç®æ³å°å¨ç¬¬ä¸æ¬¡æå»º segment çæ¶åï¼æ ¹æ®æ°æ®ç»è®¡èªå¨ä¼å cuboid éå.</p> + +<h3 id="segment-">æ¹è¿ç Segment åªæ</h3> +<p>Segmentï¼ååºï¼ä¿®åªå¯ä»¥ææå°åå°ç£çåç½ç»I / Oï¼å æ¤å¤§å¤§æé«äºæ¥è¯¢æ§è½ã è¿å»ï¼Kylin åªæååºå (partition date column) çå¼è¿è¡ segment çä¿®åªã 妿æ¥è¯¢ä¸æ²¡æå°ååºåä½ä¸ºè¿æ»¤æ¡ä»¶ï¼é£ä¹ä¿®åªå°ä¸èµ·ä½ç¨ï¼ä¼æ«æææsegmentã.<br /> +ç°å¨ä»v2.5å¼å§ï¼Kylin å°å¨ segment 级å«è®°å½æ¯ä¸ªç»´åº¦çæå°/æå¤§å¼ã 卿«æ segment ä¹åï¼ä¼å°æ¥è¯¢çæ¡ä»¶ä¸æå°/æå¤§ç´¢å¼è¿è¡æ¯è¾ã 妿ä¸å¹é ï¼å°è·³è¿è¯¥ segmentã æ£æ¥KYLIN-3370äºè§£æ´å¤ä¿¡æ¯ã</p> + +<h3 id="yarn-">å¨ YARN ä¸åå¹¶åå ¸</h3> +<p>å½ segment åå¹¶æ¶ï¼å®ä»¬çè¯å ¸ä¹éè¦åå¹¶ãå¨è¿å»ï¼åå ¸åå¹¶åçå¨ Kylin ç JVM ä¸ï¼è¿éè¦ä½¿ç¨å¤§éçæ¬å°å åå CPU èµæºã å¨æç«¯æ åµä¸ï¼å¦ææå 个并åä½ä¸ï¼ï¼å¯è½ä¼å¯¼è´ Kylin è¿ç¨å´©æºã å æ¤ï¼ä¸äºç¨æ·ä¸å¾ä¸ä¸º Kylin ä»»å¡èç¹åé æ´å¤å åï¼æè¿è¡å¤ä¸ªä»»å¡èç¹ä»¥å¹³è¡¡å·¥ä½è´è½½ã<br /> +ç°å¨ä»v2.5å¼å§ï¼Kylin å°æè¿é¡¹ä»»å¡æäº¤ç» Hadoop MapReduce å Sparkï¼è¿æ ·å°±å¯ä»¥è§£å³è¿ä¸ªç¶é¢é®é¢ã æ¥çKYLIN-3471äºè§£æ´å¤ä¿¡æ¯.</p> + +<h3 id="cube-">æ¹è¿ä½¿ç¨å ¨å±åå ¸ç cube æå»ºæ§è½</h3> +<p>å ¨å±åå ¸ (Global Dictionary) æ¯ bitmap 精确å»é计æ°çå¿ è¦æ¡ä»¶ã妿å»éåå ·æé常é«çåºæ°ï¼å GD å¯è½é常大ãå¨ cube æå»ºé¶æ®µï¼Kylin éè¦éè¿ GD å°éæ´æ°å¼è½¬æ¢ä¸ºæ´æ°ã尽管 GD 已被åæå¤ä¸ªåçï¼å¯ä»¥åå¼å è½½å°å åï¼ä½æ¯ç±äºå»éåç弿¯ä¹±åºçãKylin éè¦åå¤è½½å ¥åè½½åº(swap in/out)åçï¼è¿ä¼å¯¼è´æå»ºä»»å¡éå¸¸ç¼æ ¢ã<br /> +该å¢å¼ºåè½å¼å ¥äºä¸ä¸ªæ°æ¥éª¤ï¼ä¸ºæ¯ä¸ªæ°æ®åä»å ¨å±åå ¸ä¸æå»ºä¸ä¸ªç¼©å°çåå ¸ã éåæ¯ä¸ªä»»å¡åªéè¦å 载缩å°çåå ¸ï¼ä»èé¿å é¢ç¹çè½½å ¥åè½½åºãæ§è½å¯ä»¥æ¯ä»¥åå¿«3åãæ¥ç KYLIN-3491 äºè§£æ´å¤ä¿¡æ¯.</p> + +<h3 id="topn-count-distinct--cube-">æ¹è¿å« TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3> +<p>Cube ç大å°å¨æå»ºæ¶æ¯é¢å 估计çï¼å¹¶è¢«åç»å 个æ¥éª¤ä½¿ç¨ï¼ä¾å¦å³å® MR / Spark ä½ä¸çååºæ°ï¼è®¡ç® HBase region åå²çãå®çåç¡®ä¸å¦ä¼å¯¹æå»ºæ§è½äº§çå¾å¤§å½±åã å½åå¨ COUNT DISTINCTï¼TOPN çåº¦éæ¶åï¼å 为å®ä»¬ç大尿¯çµæ´»çï¼å æ¤ä¼°è®¡å¼å¯è½è·çå®å¼æå¾å¤§åå·®ã å¨è¿å»ï¼ç¨æ·éè¦è°æ´è¥å¹²ä¸ªåæ°ä»¥ä½¿å°ºå¯¸ä¼°è®¡æ´æ¥è¿å®é 尺寸ï¼è¿å¯¹æ®éç¨æ·æç¹å°é¾ã<br /> +ç°å¨ï¼Kylin å°æ ¹æ®æ¶éçç»è®¡ä¿¡æ¯èªå¨è°æ´å¤§å°ä¼°è®¡ãè¿å¯ä»¥ä½¿ä¼°è®¡å¼ä¸å®é 大尿´æ¥è¿ãæ¥ç KYLIN-3453 äºè§£æ´å¤ä¿¡æ¯ã</p> + +<h3 id="hadoop-30hbase-20">æ¯æHadoop 3.0/HBase 2.0</h3> +<p>Hadoop 3å HBase 2å¼å§è¢«è®¸å¤ç¨æ·éç¨ãç°å¨ Kylin æä¾ä½¿ç¨æ°ç Hadoop å HBase API ç¼è¯çæ°äºè¿å¶å ãæä»¬å·²ç»å¨ Hortonworks HDP 3.0 å Cloudera CDH 6.0 ä¸è¿è¡äºæµè¯</p> + +<p><strong>ä¸è½½</strong></p> + +<p>è¦ä¸è½½Apache Kylin v2.5.0æºä»£ç æäºè¿å¶å ï¼è¯·è®¿é®<a href="http://kylin.apache.org/download">ä¸è½½é¡µé¢</a> .</p> + +<p><strong>å级</strong></p> + +<p>åè<a href="/docs/howto/howto_upgrade.html">å级æå</a>.</p> + +<p><strong>åé¦</strong></p> + +<p>妿æ¨éå°é®é¢æçé®ï¼è¯·åéé®ä»¶è³ Apache Kylin dev æ user é®ä»¶å表ï¼[email protected]ï¼[email protected]; å¨åéä¹åï¼è¯·ç¡®ä¿æ¨å·²éè¿åéçµåé®ä»¶è³ [email protected] æ [email protected]订é äºé®ä»¶å表ã</p> + +<p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> +</description> + <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid> + + + <category>blog</category> + + </item> + + <item> <title>Use Star Schema Benchmark for Apache Kylin</title> <description><h2 id="background">Background</h2> @@ -1011,432 +1168,6 @@ send mail to Apache Kylin dev mailing li <category>blog</category> - - </item> - - <item> - <title>Get Your Interactive Analytics Superpower, with Apache Kylin and Apache Superset</title> - <description><h2 id="challenge-of-big-data">Challenge of Big Data</h2> - -<p>In the big data era, all enterprisesâ face the growing demand and challenge of processing large volumes of dataâworkloads that traditional legacy systems can no longer satisfy. With the emergence of Artificial Intelligence (AI) and Internet-of-Things (IoT) technology, it has become mission-critical for businesses to accelerate their pace of discovering valuable insights from their massive and ever-growing datasets. Thus, large companies are constantly searching for a solution, often turning to open source technologies. We will introduce two open source technologies that, when combined together, can meet these pressing big data demands for large enterprises.</p> - -<h2 id="apache-kylin-a-leading-opensource-olap-on-hadoop">Apache Kylin: a Leading OpenSource OLAP-on-Hadoop</h2> -<p>Modern organizations have had a long history of applying Online Analytical Processing (OLAP) technology to analyze data and uncover business insights. These insights help businesses make informed decisions and improve their service and product. With the emergence of the Hadoop ecosystem, OLAP has also embraced new technologies in the big data era.</p> - -<p>Apache Kylin is one such technology that directly addresses the challenge of conducting analytical workloads on massive datasets. It is already widely adopted by enterprises around the world. With powerful pre-calculation technology, Apache Kylin enables sub-second query latency over petabyte-scale datasets. The innovative and intricate design of Apache Kylin allows it to seamlessly consume data from any Hadoop-based data source, as well as other relational database management system (RDBMS). Analysts can use Apache Kylin using standard SQL through ODBC, JDBC, and Restful API, which enables the platform to integrate with any third-party applications.<br /> -<img src="/images/Kylin-and-Superset/png/1. kylin_diagram.png" alt="" /><br /> -Figure 1: Apache Kylin Architecture</p> - -<p>In a fast-paced and rapidly-changing business environment, business users and analysts are expected to uncover insights with speed of thoughts. They can meet this expectation with Apache Kylin, and no longer subjected to the predicament of waiting for hours for one single query to return results. Such a powerful data processing engine empowers the data scientists, engineers, and business analysts of any enterprise to find insights to help reach critical business decisions. However, business decisions cannot be made without rich data visualization. To address this last-mile challenge of big data analytics, Apache Superset comes into the picture.</p> - -<h2 id="apache-superset-modern-enterprise-ready-business-intelligence-platform">Apache Superset: Modern, Enterprise-ready Business Intelligence Platform</h2> - -<p>Apache Superset is a data exploration and visualization platform designed to be visual, intuitive, and interactive. A user can access data in the following two ways:</p> - -<ol> - <li> - <p>Access data from the following commonly used data sources one table at a time: Kylin, Presto, Hive, Impala, SparkSQL, MySQL, Postgres, Oracle, Redshift, SQL Server, Druid.</p> - </li> - <li> - <p>Use a rich SQL Interactive Development Environment (IDE) called SQL Lab that is designed for power users with the ability to write SQL queries to analyze multiple tables.</p> - </li> -</ol> - -<p>Users can immediately analyze and visualize their query results using Apache Superset âs rich visualization and reporting features.</p> - -<p><img src="/images/Kylin-and-Superset/png/2. superset_logo.png" alt="" /><br /> -Figure 2</p> - -<p><img src="/images/Kylin-and-Superset/png/3. Superset_screen_shot.png" alt="" /><br /> -Figure 3: Apache Superset Visualization Interface</p> - -<h2 id="integrating-apache-kylin-and-apache-superset-to-boost-your-productivity">Integrating Apache Kylin and Apache Superset to Boost Your Productivity</h2> - -<p>Both Apache Kylin and Apache Superset are built to provide fast and interactive analytics for their users. The combination of these two open source projects can bring that goal to reality on petabyte-scale datasets, thanks to pre-calculated Kylin Cube.</p> - -<p>The Kyligence Data Science team has recently open sourced kylinpy, a project that makes this combination possible. Kylinpy is a Python-based Apache Kylin client library. Any application that uses SQLAlchemy can now query Apache Kylin with this library installed, specifically Apache Superset. Below is a brief tutorial that shows how to integrate Apache Kylin and Apache Superset.</p> - -<h2 id="prerequisite">Prerequisite</h2> -<ol> - <li>Install Apache Kylin<br /> -Please refer to this installation tutorial.</li> - <li>Apache Kylin provides a script for you to create a sample Cube. After you successfully installed Apache Kylin, you can run the below script under Apache Kylin installation directory to generate sample project and Cube. <br /> - ./${KYLIN_HOME}/bin/sample.sh</li> - <li> - <p>When the script finishes running, log onto Apache Kylin web with default user ADMIN/KYLIN; in the system page click âReload Metadata,â then you will see a sample project called âLearn Kylin.â</p> - </li> - <li>Select the sample cube âkylin_sales_cubeâ, click âActionsâ -&gt; âBuildâ, pick a date later than 2014-01-01 (to cover all 10000 sample records);</li> -</ol> - -<p><img src="/images/Kylin-and-Superset/png/4. build_cube.png" alt="" /><br /> -Figure 4: Build Cube in Apache Kylin</p> - -<ol> - <li>Check the build progress in âMonitorâ tab until it reaches 100%;</li> - <li>Execute SQL in the âInsightâ tab, for example:</li> -</ol> - -<div class="highlighter-rouge"><pre class="highlight"><code> select part_dtï¼ - sum(price) as total_selledï¼ - count(distinct seller_id) as sellers - from kylin_sales - group by part_dt - order by part_dt --- #This query will hit on the newly built Cube âKylin_sales_cubeâ. -</code></pre> -</div> - -<ol> - <li>Next, we will install Apache Superset and initialize it.<br /> - You may refer to Apache Superset official website instruction to install and initialize.</li> - <li>Install kylinpy</li> -</ol> - -<div class="highlighter-rouge"><pre class="highlight"><code> $ pip install kylinpy -</code></pre> -</div> - -<ol> - <li>Verify your installation, if everything goes well, Apache Superset daemon should be up and running.</li> -</ol> - -<div class="highlighter-rouge"><pre class="highlight"><code>$ superset runserver -d -Starting server with command: -gunicorn -w 2 --timeout 60 -b 0.0.0.0:8088 --limit-request-line 0 --limit-request-field_size 0 superset:app - -[2018-01-03 15:54:03 +0800] [73673] [INFO] Starting gunicorn 19.7.1 -[2018-01-03 15:54:03 +0800] [73673] [INFO] Listening at: http://0.0.0.0:8088 (73673) -[2018-01-03 15:54:03 +0800] [73673] [INFO] Using worker: sync -[2018-01-03 15:54:03 +0800] [73676] [INFO] Booting worker with pid: 73676 -[2018-01-03 15:54:03 +0800] [73679] [INFO] Booting worker with pid: 73679 -</code></pre> -</div> - -<h2 id="connect-apache-kylin-from-apachesuperset">Connect Apache Kylin from ApacheSuperset</h2> - -<p>Now everything you need is installed and ready to go. Letâs try to create an Apache Kylin data source in Apache Superset.<br /> -1. Open up http://localhost:8088 in your web browser with the credential you set during Apache Superset installation.<br /> - <img src="/images/Kylin-and-Superset/png/5. superset_1.png" alt="" /><br /> - Figure 5: Apache Superset Login Page</p> - -<ol> - <li>Go to Source -&gt; Datasource to configure a new data source. - <ul> - <li>SQLAlchemy URI pattern is : kylin://<username>:<password>@<hostname>:<port>/<project name=""></project></port></hostname></password></username></li> - <li>Check âExpose in SQL Labâ if you want to expose this data source in SQL Lab.</li> - <li>Click âTest Connectionâ to see if the URI is working properly.</li> - </ul> - </li> -</ol> - -<p><img src="/images/Kylin-and-Superset/png/6. superset_2.png" alt="" /><br /> - Figure 6: Create an Apache Kylin data source</p> - -<p><img src="/images/Kylin-and-Superset/png/7. superset_3.png" alt="" /><br /> - Figure 7: Test Connection to Apache Kylin</p> - -<p>If the connection to Apache Kylin is successful, you will see all the tables from Learn_kylin project show up at the bottom of the connection page.</p> - -<p><img src="/images/Kylin-and-Superset/png/8. superset_4.png" alt="" /><br /> -Figure 8: Tables will show up if connection is successful</p> - -<h3 id="query-kylin-table">Query Kylin Table</h3> -<ol> - <li>Go to Source -&gt; Tables to add a new table, type in a table name from âLearn_kylinâ project, for example, âKylin_salesâ.</li> -</ol> - -<p><img src="/images/Kylin-and-Superset/png/9. superset_5.png" alt="" /><br /> -Figure 9 Add Kylin Table in Apache Superset</p> - -<ol> - <li>Click on the table you created. Now you are ready to analyze your data from Apache Kylin.</li> -</ol> - -<p><img src="/images/Kylin-and-Superset/png/10. superset_6.png" alt="" /><br /> -Figure 10 Query single table from Apache Kylin</p> - -<h3 id="query-multiple-tables-from-kylin-using-sql-lab">Query Multiple Tables from Kylin Using SQL Lab.</h3> -<p>Kylin Cube is usually based on a data model joined by multiples tables. Thus, it is quite common to query multiple tables at the same time using Apache Kylin. In Apache Superset, you can use SQL Lab to join your data across tables by composing SQL queries. We will use a query that can hit on the sample cube âkylin_sales_cubeâ as an example. <br /> -When you run your query in SQL Lab, the result will come from the data source, in this case, Apache Kylin.</p> - -<p><img src="/images/Kylin-and-Superset/png/11. SQL_Lab.png" alt="" /><br /> -Figure 11 Query multiple tables from Apache Kylin using SQL Lab</p> - -<p>When the query returns results, you may immediately visualize them by clicking on the âVisualizeâ button.<br /> -<img src="/images/Kylin-and-Superset/png/12. SQL_Lab_2.png" alt="" /><br /> -Figure 12 Define your query and visualize it immediately</p> - -<p>You may copy the entire SQL below to experience how you can query Kylin Cube in SQL Lab. <br /> -<code class="highlighter-rouge"> -select -YEAR_BEG_DT, -MONTH_BEG_DTï¼ -WEEK_BEG_DTï¼ -META_CATEG_NAMEï¼ -CATEG_LVL2_NAME, -CATEG_LVL3_NAME, -OPS_REGION, -NAME as BUYER_COUNTRY_NAME, -sum(PRICE) as GMV, -sum(ACCOUNT_BUYER_LEVEL) ACCOUNT_BUYER_LEVEL, -count(*) as CNT -from KYLIN_SALES -join KYLIN_CAL_DT on CAL_DT=PART_DT -join KYLIN_CATEGORY_GROUPINGS on SITE_ID=LSTG_SITE_ID and KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID=KYLIN_SALES.LEAF_CATEG_ID -join KYLIN_ACCOUNT on ACCOUNT_ID=BUYER_ID -join KYLIN_COUNTRY on ACCOUNT_COUNTRY=COUNTRY -group by YEAR_BEG_DT, MONTH_BEG_DTï¼WEEK_BEG_DTï¼META_CATEG_NAMEï¼CATEG_LVL2_NAME, CATEG_LVL3_NAME, OPS_REGION, NAME -</code><br /> -## Experience All Features in Apache Superset with Apache Kylin</p> - -<p>Most of the common reporting features are available in Apache Superset. Now letâs see how we can use those features to analyze data from Apache Kylin.</p> - -<h3 id="sorting">Sorting</h3> -<p>You may sort by a measure regardless of how it is visualized.</p> - -<p>You may specify a âSort Byâ measure or sort the measure on the visualization after the query returns.</p> - -<p><img src="/images/Kylin-and-Superset/png/13. sort.png" alt="" /><br /> -Figure 13 Sort by</p> - -<h3 id="filtering">Filtering</h3> -<p>There are multiple ways you may filter data from Apache Kylin.<br /> -1. Date Filter<br /> - You may filter date and time dimension with the calendar filter. <br /> - <img src="/images/Kylin-and-Superset/png/14. time_filter.png" alt="" /><br /> - Figure 14 Filtering time</p> - -<ol> - <li> - <p>Dimension Filter<br /> - For other dimensions, you may filter it with SQL conditions like âin, not in, equal to, not equal to, greater than and equal to, smaller than and equal to, greater than, smaller than, likeâ.<br /> - <img src="/images/Kylin-and-Superset/png/15. dimension_filter.png" alt="" /><br /> - Figure 15 Filtering dimension</p> - </li> - <li> - <p>Search Box<br /> - In some visualizations, it is also possible to further narrow down your result set after the query is returned from the data source using the âSearch Boxâ. <br /> - <img src="/images/Kylin-and-Superset/png/16. search_box.png" alt="" /><br /> - Figure 16 Search Box</p> - </li> - <li> - <p>Filtering the measure<br /> - Apache Superset allows you to write a âhaving clauseâ to filtering the measure. <br /> - <img src="/images/Kylin-and-Superset/png/17. having.png" alt="" /><br /> - Figure 17 Filtering measure</p> - </li> - <li> - <p>Filter Box<br /> - The filter box visualization allows you to create a drop-down style filter that can filter all slices on a dashboard dynamically <br /> - As the screenshot below shows, if you filter the CATE_LVL2_NAME dimension from the filter box, all the visualizations on this dashboard will be filtered based on your selection. <br /> - <img src="/images/Kylin-and-Superset/png/18. filter_box.png" alt="" /><br /> - Figure 18 The filter box visualization</p> - </li> -</ol> - -<h3 id="top-n">Top-N</h3> -<p>To provide higher performance in query time for Top N query, Apache Kylin provides approximate Top N measure to pre-calculate the top records. In Apache Superset, you may use both âSort Byâ and âRow Limitâ feature to make sure your query can utilize the Top N pre-calculation from Kylin Cube. <br /> - <img src="/images/Kylin-and-Superset/png/19. top10.png" alt="" /><br /> - Figure 19 use both âSort Byâ and âRow Limitâ to get Top 10</p> - -<h3 id="page-length">Page Length</h3> -<p>Apache Kylin users usually need to deal with high cardinality dimension. When displaying a high cardinality dimension, the visualization will display too many distinct values, taking a long time to render. In that case, it is nice that Apache Superset provides the page length feature to limit the number of rows per page. This way the up-front rendering effort can be reduced. <br /> - <img src="/images/Kylin-and-Superset/png/20. page_length.png" alt="" /><br /> - Figure 20 Limit page length</p> - -<h3 id="visualizations">Visualizations</h3> -<p>Apache Superset provides a rich and extensive set of visualizations. From basic charts like a pie chart, bar chart, line chart to advanced visualizations, like a sunburst, heatmap, world map, Sankey diagram. <br /> - <img src="/images/Kylin-and-Superset/png/21. viz.png" alt="" /><br /> - Figure 21</p> - -<p><img src="/images/Kylin-and-Superset/png/22. viz_2.png" alt="" /><br /> - Figure 22</p> - -<p><img src="/images/Kylin-and-Superset/png/23. map.png" alt="" /><br /> - Figure 23 World map visualization</p> - -<p><img src="/images/Kylin-and-Superset/png/24. bubble.png" alt="" /><br /> - Figure 24 bubble chart</p> - -<h3 id="other-functionalities">Other functionalities</h3> -<p>Apache Superset also supports exporting to CSV, sharing, and viewing SQL query.</p> - -<h2 id="summary">Summary</h2> -<p>With the right technical synergy of open source projects, you can achieve amazing results, more than the sum of its parts. The pre-calculation technology of Apache Kylin accelerates visualization performance. The rich functionality of Apache Superset enables all Kylin Cube features to be fully utilized. When you marry the two, you get the superpower of accelerated interactive analytics.</p> - -<h2 id="references">References</h2> - -<ol> - <li><a href="http://kylin.apache.org">Apache Kylin</a></li> - <li><a href="https://github.com/Kyligence/kylinpy">kylinpy on Github</a></li> - <li><a href="https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5">Superset:Airbnbâs data exploration platform</a></li> - <li><a href="https://github.com/apache/incubator-superset">Apache Superset on Github</a></li> -</ol> - -</description> - <pubDate>Mon, 01 Jan 2018 04:28:00 -0800</pubDate> - <link>http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</link> - <guid isPermaLink="true">http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</guid> - - - <category>blog</category> - - </item> - - <item> - <title>Improving Spark Cubing in Kylin 2.0</title> - <description><p>Apache Kylin is a OLAP Engine that speeding up query by Cube precomputation. The Cube is multi-dimensional dataset which contain precomputed all measures in all dimension combinations. Before v2.0, Kylin uses MapReduce to build Cube. In order to get better performance, Kylin 2.0 introduced the Spark Cubing. About the principle of Spark Cubing, please refer to the article <a href="http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/">By-layer Spark Cubing</a>.</p> - -<p>In this blog, I will talk about the following topics:</p> - -<ul> - <li>How to make Spark Cubing support HBase cluster with Kerberos enabled</li> - <li>Spark configurations for Cubing</li> - <li>Performance of Spark Cubing</li> - <li>Pros and cons of Spark Cubing</li> - <li>Applicable scenarios of Spark Cubing</li> - <li>Improvement for dictionary loading in Spark Cubing</li> -</ul> - -<p>In currently Spark Cubing(2.0) version, it doesnât support HBase cluster using Kerberos bacause Spark Cubing need to get matadata from HBase. To solve this problem, we have two solutions: one is to make Spark could connect HBase with Kerberos, the other is to avoid Spark connect to HBase in Spark Cubing.</p> - -<h3 id="make-spark-connect-hbase-with-kerberos-enabled">Make Spark connect HBase with Kerberos enabled</h3> -<p>If just want to run Spark Cubing in Yarn client mode, we only need to add three line code before new SparkConf() in SparkCubingByLayer:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code> Configuration configuration = HBaseConnection.getCurrentHBaseConfiguration(); - HConnection connection = HConnectionManager.createConnection(configuration); - //Obtain an authentication token for the given user and add it to the user's credentials. - TokenUtil.obtainAndCacheToken(connection, UserProvider.instantiate(configuration).create(UserGroupInformation.getCurrentUser())); -</code></pre> -</div> - -<p>As for How to make Spark connect HBase using Kerberos in Yarn cluster mode, please refer to SPARK-6918, SPARK-12279, and HBASE-17040. The solution may work, but not elegant. So I tried the sencond solution.</p> - -<h3 id="use-hdfs-metastore-for-spark-cubing">Use HDFS metastore for Spark Cubing</h3> - -<p>The core idea here is uploading the necessary metadata job related to HDFS and using HDFSResourceStore manage the metadata.</p> - -<p>Before introducing how to use HDFSResourceStore instead of HBaseResourceStore in Spark Cubing. Letâs see whatâs Kylin metadata format and how Kylin manages the metadata.</p> - -<p>Every concrete metadata for table, cube, model and project is a JSON file in Kylin. The whole metadata is organized by file directory. The picture below is the root directory for Kylin metadata,<br /> -<img src="http://static.zybuluo.com/kangkaisen/t1tc6neiaebiyfoir4fdhs11/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.51.43.png" alt="å±å¹å¿«ç § 2017-07-02 ä¸å3.51.43.png-20.7kB" /><br /> -This following picture shows the content of project dir, the âlearn_kylinâ and âkylin_testâ are both project names.<br /> -<img src="http://static.zybuluo.com/kangkaisen/4dtiioqnw08w6vtj0r9u5f27/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202017-07-02%20%E4%B8%8B%E5%8D%883.54.59.png" alt="å±å¹å¿«ç § 2017-07-02 ä¸å3.54.59.png-11.8kB" /></p> - -<p>Kylin manage the metadata using ResourceStore, ResourceStore is a abstract class, which abstract the CRUD Interface for metadata. ResourceStore has three implementation classesï¼</p> - -<ul> - <li>FileResourceStore (store with Local FileSystem)</li> - <li>HDFSResourceStore</li> - <li>HBaseResourceStore</li> -</ul> - -<p>Currently, only HBaseResourceStore could use in production env. FileResourceStore mainly used for testing. HDFSResourceStore doesnât support massive concurrent write, but it is ideal to use for read only scenario like Cubing. Kylin use the âkylin.metadata.urlâ config to decide which kind of ResourceStore will be used.</p> - -<p>Now, Letâs see How to use HDFSResourceStore instead of HBaseResourceStore in Spark Cubing.</p> - -<ol> - <li>Determine the necessary metadata for Spark Cubing job</li> - <li>Dump the necessary metadata from HBase to local</li> - <li>Update the kylin.metadata.url and then write all Kylin config to âkylin.propertiesâ file in local metadata dir.</li> - <li>Use ResourceTool upload the local metadata to HDFS.</li> - <li>Construct the HDFSResourceStore from the HDFS âkylin.propertiesâ file in Spark executor.</li> -</ol> - -<p>Of course, We need to delete the HDFS metadata dir on complete. Iâm working on a patch for this, please watch KYLIN-2653 for update.</p> - -<h3 id="spark-configurations-for-cubing">Spark configurations for Cubing</h3> - -<p>Following is the Spark configuration I used in our environment. It enables Spark dynamic resource allocation; the goal is to let our user set less Spark configurations.</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>//running in yarn-cluster mode -kylin.engine.spark-conf.spark.master=yarn -kylin.engine.spark-conf.spark.submit.deployMode=cluster - -//enable the dynamic allocation for Spark to avoid user set the number of executors explicitly -kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true -kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=10 -kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1024 -kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 -kylin.engine.spark-conf.spark.shuffle.service.enabled=true -kylin.engine.spark-conf.spark.shuffle.service.port=7337 - -//the memory config -kylin.engine.spark-conf.spark.driver.memory=4G -//should enlarge the executor.memory when the cube dict is huge -kylin.engine.spark-conf.spark.executor.memory=4G -//because kylin need to load the cube dict in executor -kylin.engine.spark-conf.spark.executor.cores=1 - -//enlarge the timeout -kylin.engine.spark-conf.spark.network.timeout=600 - -kylin.engine.spark-conf.spark.yarn.queue=root.hadoop.test - -kylin.engine.spark.rdd-partition-cut-mb=100 -</code></pre> -</div> - -<h3 id="performance-test-of-spark-cubing">Performance test of Spark Cubing</h3> - -<p>For the source data scale from millions to hundreds of millions, my test result is consistent with the blog <a href="http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/">By-layer Spark Cubing</a>. The improvement is remarkable. Moreover, I also tested with billions of source data and having huge dictionary specially.</p> - -<p>The test Cube1 has 2.7 billion source data, 9 dimensions, one precise distinct count measure having 70 million cardinality (which means the dict also has 70 million cardinality).</p> - -<p>Test test Cube2 has 2.4 billion source data, 13 dimensions, 38 measures(contains 9 precise distinct count measures).</p> - -<p>The test result is shown in below picture, the unit of time is minute.<br /> -<img src="http://static.zybuluo.com/kangkaisen/1urzfkal8od52fodi1l6u0y5/image.png" alt="image.png-38.1kB" /></p> - -<p>In one word, <strong>Spark Cubing is much faster than MR cubing in most scenes</strong>.</p> - -<h3 id="pros-and-cons-of-spark-cubing">Pros and Cons of Spark Cubing</h3> -<p>In my opinion, the advantage for Spark Cubing includes:</p> - -<ol> - <li>Because of the RDD cache, Spark Cubing could take full advantage of memory to avoid disk I/O.</li> - <li>When we have enough memory resource, Spark Cubing could use more memory resource to get better build performance.</li> -</ol> - -<p>On the contraryï¼the drawback for Spark Cubing includes:</p> - -<ol> - <li>Spark Cubing couldnât handle huge dictionary well (hundreds of millions of cardinality);</li> - <li>Spark Cubing isnât stable enough for very large scale data.</li> -</ol> - -<h3 id="applicable-scenarios-of-spark-cubing">Applicable scenarios of Spark Cubing</h3> -<p>In my opinion, except the huge dictionary scenario, we all could use Spark Cubing to replace MR Cubing, especially under the following scenarios:</p> - -<ol> - <li>Many dimensions</li> - <li>Normal dictionaries (e.g, cardinality &lt; 1 hundred millions)</li> - <li>Normal scale data (e.g, less than 10 billion rows to build at once).</li> -</ol> - -<h3 id="improvement-for-dictionary-loading-in-spark-cubing">Improvement for dictionary loading in Spark Cubing</h3> - -<p>As we all known, a big difference for MR and Spark is, the task for MR is running in process, but the task for Spark is running in thread. So, in MR Cubing, the dict of Cube only load once, but in Spark Cubing, the dict will be loaded many times in one executor, which will cause frequent GC.</p> - -<p>So, I made the two improvements:</p> - -<ol> - <li>Only load the dict once in one executor.</li> - <li>Add maximumSize for LoadingCache in the AppendTrieDictionary to make the dict removed as early as possible.</li> -</ol> - -<p>These two improvements have been contributed into Kylin repository.</p> - -<h3 id="summary">Summary</h3> -<p>Spark Cubing is a great feature for Kylin 2.0, Thanks Kylin community. We will apply Spark Cubing in real scenarios in our company. I believe Spark Cubing will be more robust and efficient in the future releases.</p> - -</description> - <pubDate>Fri, 21 Jul 2017 15:22:22 -0700</pubDate> - <link>http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</link> - <guid isPermaLink="true">http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</guid> - - - <category>blog</category> </item>
