>From the discussion, apparently a new storage will be added sooner or late.
Will it be a new big version of Kylin? Like Apache Kylin 3.0? Also how about the migration from old storage? I assume old cube data has to be transformed and loaded into the new storage. Yang On Sat, Dec 29, 2018 at 5:52 PM ShaoFeng Shi <shaofeng...@apache.org> wrote: > Thanks very much for Yiming and Jiatao's comments, they're very valueable. > There are many improvements can do for this new storage. We welcome all > kinds of contribution and would like to improve it together with the > community in the year of 2019! > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Work email: shaofeng....@kyligence.io > Kyligence Inc: https://kyligence.io/ > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: user-subscr...@kylin.apache.org > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > JiaTao Tao <taojia...@gmail.com> 于2018年12月19日周三 下午8:44写道: > > > Hi all, > > > > Truly agreed with Yiming, and here I expand a little more about > > "Distributed computing". > > > > As Yiming mentioned, Kylin will parse the query into an execution plan > > using Calcite(Kylin will change the execution plan cuz the data in cubes > is > > already aggregated, we cannot use the origin plan directly). It's a tree > > structure, a node represents a specific calculation and data goes from > > bottom to top applying all these calculations. > > [image: image.png] > > (Pic from https://blog.csdn.net/yu616568/article/details/50838504, a > > really good blog.) > > > > At present, Kylin will do almost all these calculations only in its own > > node, in other words, we cannot fully use the power of the cluster, and > > it's a SPOF. And here comes a design that we can visit this tree, *and > > transform each node into operations to Spark's Dataframes(i.e. "DF").* > > > > More specifically, we will visit the nodes recursively until we met the > > "TableScan" node(like a stack pushing operation). e.g. In the above > > diagram, the first node we met is a "Sort" node, we just visit its > > child(ren), and we'll not stop visiting each node's child(ren) until we > met > > a "TableScan" node. > > > > In the "TableScan" node, we will generate the initial DF, then the DF > will > > be poped to the "Filter" node, and the "Filter" node will apply its own > > operation like "df.filter(xxx)". Finally, we will apply each node's > > operation to this DF, and the final call chain will like: > > "df.filter(xxx).select(xxx).agg(xxx).sort(xxx)". > > > > After we got the final Dataframe and triggered the calculation, all the > > rest were handled by Spark. And we can gain tremendous benefits in > > computation level, more details can be seen in my previous post: > > > http://apache-kylin.74782.x6.nabble.com/Re-DISCUSS-Columnar-storage-engine-for-Apache-Kylin-tc12113.html > > . > > > > > > -- > > > > > > Regards! > > > > Aron Tao > > > > > > 许益铭 <x1860...@gmail.com> 于2018年12月19日周三 上午11:40写道: > > > >> hi All! > >> 关于CHAO LONG提到的几个问题,我有以下几个看法: > >> > >> > 1.当前我们的架构是分为两层的,一层是storage层,一层是计算层.在storage层,我们已经做了一些优化,在storage层做了预聚合来减少返回的数据量,但是runtime的聚合和连接发生在kylin > >> server端,序列化无可避免,且这个架构容易导致单点瓶颈,如果runtime > >> 的agg或join数据量比较大的话,会导致查询性能直线下降,kylin > >> server GC严重 > >> > >> > >> > 2.关于字典问题,字典是当初为了在hbase中对齐rowkey,同时也为了减少一部分的存储而引入的设计.但这也引入另外一个问题,hbase很难处理非定长的string类型的dimension,如果遇到高基的非定长dimension,往往只能去建立一个很大的字典或者给一个比较大的fixlength,导致存储翻倍,同时因为字典比较大,查询性能会受到很大影响(gc).如果我们使用列式存储,是可以不需要考虑这个问题的. > >> > >> 3.我们要使用parquet的page > >> > index,必须把tuplefilter转换成parquet的filter,这个工作量不小.而且我们的数据都是被编码过的,parquet的page > >> index只会根据page上的min max来进行过滤,因此对于binary的数据,是无法做filter的. > >> > >> 我觉得使用spark来做我们的计算引擎能解决上述所有问题: > >> > >> 1.分布式计算 > >> sql通过calcite解析优化之后会生成olap > >> > >> > rel的一颗树,而spark的catalyst也是通过解析sql生成一棵树后,自动优化成为dataframe来计算,如果calcite的plan能够转换成spark的plan,那么我们将实现分布式计算,calcite只负责解析sql和返回结果集,减少kylin > >> server端的压力. > >> > >> 2.去掉字典 > >> > >> > 字典有个很好的作用就是在中低基数下减少储存压力,但是也有一个坏处就是其数据文件无法脱离字典单独使用,我建议刚开始可以不考虑字典类型的encoding,让系统尽可能的简单,默认使用parquet的page级别的dictionary即可. > >> > >> 3.parquet存储使用列的真实类型,而不是使用binary > >> > >> > 如上,parquet对于binary的filter能力极弱,而使用基本类型能够直接使用spark的Vectorizedread,加速数据读取速度和计算. > >> > >> 4.使用spark适配parquet > >> 当前的spark已经适配了parquet,spark的pushed > >> filter已经被转换成为了parquet能用的filter,这里只需要升级parquet版本后稍加修改就能提供parquet的page > >> index能力. > >> > >> 5.index server > >> 就如JiaTao Tao所述,index server分为file index 和 page index ,字典的过滤无非就是file > >> index的一种,因为我们可以在这里插入一个index server. > >> > >> > >> hi,all! > >> I have the following views: > >> 1. At present, our architecture is divided into two layers, one is the > >> storage layer, and the other is the computing layer. In the storage > layer, > >> we have made some optimizations and do pre-aggregation in the storage > >> layer > >> to reduce the amount of data returned. However, the aggregation and > >> connection of the runtime occurs on the kylin server side. Serialization > >> is > >> inevitable, and this architecture is easy to cause a single point > >> bottleneck. If the agg or join data of the runtime is relatively large, > >> the > >> query performance will drop linearly, and the kylin server GC will be > >> severe. > >> > >> 2. As for the dictionary problem, canceling dictionary encoding is a > good > >> choice. The dictionary was originally designed to align rowkey in hbase > >> and > >> also to reduce part of the storage. But this also introduces another > >> problem, it is difficult to handle non-fixed string type dimension If > you > >> encounter a UHC dimension, you can only create a large dictionary or > give > >> a > >> larger fix-length, which causes the storage to double, and because the > >> dictionary is large, the query performance will be greatly affected. We > >> use > >> columnar storage, we don't need to consider this problem. > >> > >> 3. We need to use the page index of the parquet, we must convert the > tuple > >> filter into the filter of the parquet. This workload is not small. And > our > >> data is encoded. The page index of the parquet will only be based on the > >> min and max value on the page. Filtering, so for binary data, it is > >> impossible to do filter. > >> > >> I think using spark to do our calculation engine solves all of the above > >> problems: > >> > >> Distributed computing > >> Sql through calcite analysis optimization will generate a tree of OLAP > >> rel, > >> and spark's catalyst is also generated by parsing SQL after a tree, > >> automatically optimized to become a dataframe to calculate, if the plan > of > >> calcite can be converted into a spark plan, then we will achieve > >> distributed computing, calcite is only responsible for parsing SQL and > >> returning result sets, reducing the pressure on the kylin server side. > >> > >> 2. Remove the dictionary > >> The dictionary has a very good effect to reduce the storage pressure in > >> the > >> low and medium base, but there is also a disadvantage that its data > files > >> can not be used separately from the dictionary. I suggest that you can > use > >> the page level of the dictionary without considering the dictionary type > >> encoding. > >> > >> 3.parquet storage uses the true type of the column instead of using > binary > >> As above, parquet has a very weak filter capability for binary, and the > >> basic type can directly use spark's Vectorizedread to speed up data > >> reading > >> speed and calculation. > >> > >> 4. Use spark to match the parquet > >> The current spark has been adapted to the parquet. The sparked filter of > >> the spark has been converted into a filter that can be used by the > >> parquet. > >> Here, you only need to upgrade the version of the parcel and modify it > to > >> provide the page index of the parquet. > >> > >> 5.index server > >> As described by JiaTao Tao, the index server is divided into file index > >> and > >> page index. The filtering of the dictionary is nothing but a file index, > >> because we can insert an index server here. > >> > >> JiaTao Tao <taojia...@gmail.com> 于2018年12月19日周三 下午4:45写道: > >> > >> > Hi Gang > >> > > >> > In my opinion, segments/partition pruning is actually in the scope of > >> > "Index system", we can have an "Index system" in storage level > including > >> > File index(for segment/partition pruning), page index(for page > pruning) > >> > etc. We can put all these stuff in such a system and make the > >> separation of > >> > duties cleaner. > >> > > >> > > >> > Ma Gang <mg4w...@163.com> 于2018年12月19日周三 上午6:31写道: > >> > > >> > > Awesome! Looking forward to the improvement. For dictionary, keep > the > >> > > dictionary in query engine, most time is not good since it brings > >> lots of > >> > > pressure to Kylin server, but sometimes it has benefit, for example, > >> some > >> > > segments can be pruned very early when filter value is not in the > >> > > dictionary, and some queries can be answer directly using dictionary > >> as > >> > > described in: https://issues.apache.org/jira/browse/KYLIN-3490 > >> > > > >> > > At 2018-12-17 15:36:01, "ShaoFeng Shi" <shaofeng...@apache.org> > >> wrote: > >> > > > >> > > The dimension dictionary is a legacy design for HBase storage I > think; > >> > > because HBase has no data type, everything is a byte array, this > makes > >> > > Kylin has to encode STRING and other types with some encoding method > >> like > >> > > the dictionary. > >> > > > >> > > Now with the storage like Parquet, it would decide how to encode the > >> data > >> > > at the page or block level. Then we can drop the dictionary after > the > >> > cube > >> > > is built. This will release the memory pressure of Kylin query nodes > >> and > >> > > also benefit the UHC case. > >> > > > >> > > Best regards, > >> > > > >> > > Shaofeng Shi 史少锋 > >> > > Apache Kylin PMC > >> > > Work email: shaofeng....@kyligence.io > >> > > Kyligence Inc: https://kyligence.io/ > >> > > > >> > > Apache Kylin FAQ: > >> https://kylin.apache.org/docs/gettingstarted/faq.html > >> > > Join Kylin user mail group: user-subscr...@kylin.apache.org > >> > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > >> > > > >> > > > >> > > > >> > > > >> > > Chao Long <wayn...@qq.com> 于2018年12月17日周一 下午1:23写道: > >> > > > >> > >> In this PoC, we verified Kylin On Parquet is viable, but the query > >> > >> performance still have room to improve. We can improve it from the > >> > >> following aspects: > >> > >> > >> > >> 1, Minimize result set serialization time > >> > >> Since Kylin need Object[] data to process, we convert Dataset to > >> RDD, > >> > >> and then convert the "Row" type to Object[], so Spark need to > >> serialize > >> > >> Object[] before return it to driver. Those time need to be avoided. > >> > >> > >> > >> 2, Query without dictionary > >> > >> In this PoC, for less storage use, we keep dict encode value in > >> Parquet > >> > >> file for dict-encode dimensions, so Kylin must load dictionary to > >> > convert > >> > >> dict value for query. If we keep original value for dict-encode > >> > dimension, > >> > >> dictionary is unnecessary. And we don't hava to worry about the > >> storage > >> > >> use, because Parquet will encode it. We should remove dictionary > from > >> > query. > >> > >> > >> > >> 3, Remove query single-point issue > >> > >> In this PoC, we use Spark to read and process Cube data, which is > >> > >> distributed, but kylin alse need to process result data the Spark > >> > returned > >> > >> in single jvm. We can try to make it distributed too. > >> > >> > >> > >> 4, Upgrade Parquet to 1.11 for page index > >> > >> In this PoC, Parquet don't have page index, we get a poor filter > >> > >> performance. We need to upgrade Parquet to version 1.11 which has > >> page > >> > >> index to improve filter performance. > >> > >> > >> > >> ------------------ > >> > >> Best Regards, > >> > >> Chao Long > >> > >> > >> > >> ------------------ 原始邮件 ------------------ > >> > >> *发件人:* "ShaoFeng Shi"<shaofeng...@apache.org>; > >> > >> *发送时间:* 2018年12月14日(星期五) 下午4:39 > >> > >> *收件人:* "dev"<dev@kylin.apache.org>;"user"<u...@kylin.apache.org>; > >> > >> *主题:* Evaluate Kylin on Parquet > >> > >> > >> > >> Hello Kylin users, > >> > >> > >> > >> The first version of Kylin on Parquet [1] feature has been staged > in > >> > >> Kylin code repository for public review and evaluation. You can > check > >> > out > >> > >> the "kylin-on-parquet" branch [2] to read the code, and also can > >> make a > >> > >> binary build to run an example. When creating a cube, you can > select > >> > >> "Parquet" as the storage in the "Advanced setting" page. Both > >> MapReduce > >> > and > >> > >> Spark engines support this new storage. A tech blog is under > drafting > >> > for > >> > >> the design and implementation. > >> > >> > >> > >> Thanks so much to the engineers' hard work: Chao Long and Yichen > >> Zhou! > >> > >> > >> > >> This is not the final version; there is room to improve in many > >> aspects, > >> > >> parquet, spark, and Kylin. It can be used for PoC at this moment. > >> Your > >> > >> comments are welcomed. Let's improve it together. > >> > >> > >> > >> [1] https://issues.apache.org/jira/browse/KYLIN-3621 > >> > >> [2] https://github.com/apache/kylin/tree/kylin-on-parquet > >> > >> > >> > >> Best regards, > >> > >> > >> > >> Shaofeng Shi 史少锋 > >> > >> Apache Kylin PMC > >> > >> Work email: shaofeng....@kyligence.io > >> > >> Kyligence Inc: https://kyligence.io/ > >> > >> > >> > >> Apache Kylin FAQ: > >> https://kylin.apache.org/docs/gettingstarted/faq.html > >> > >> Join Kylin user mail group: user-subscr...@kylin.apache.org > >> > >> Join Kylin dev mail group: dev-subscr...@kylin.apache.org > >> > >> > >> > >> > >> > >> > >> > > > >> > > > >> > > > >> > > >> > > >> > -- > >> > > >> > > >> > Regards! > >> > > >> > Aron Tao > >> > > >> > > > > >