My opinion is we should pursue a storage solution most suitable for Kylin, performance should be the number one priority, the next is the cost. With this new storage solution, the cost to use Kylin and build cubes can be reduced.
Some well-known columnar storage format like Parquet or ORC, they are wildly adopted with comprehensive support for big data analytics. But they can't meet the requirements well for the adhoc/interactive queries. One of key challenge is the data relevance. Pre-calculations and indexes are the common approaches to improve the data relevance. Kylin take the first approach, the cost is higher compare to indexes, we have to build many cuboids, to reduce the cost, cube design becomes critical. But this introduce another level of complexity. If we can combine pre-calculations and indexes together, we might not need to build so many cuboids. This gives us the potential to extends Kylin's capability: support 100+ dimensions, support large lookup tables etc. Today both Parquet and ORC has some kind of data skipping index, the efficiency of the skipping highly depends on the sorting order of columns. And before the data skipping take effect, query engines need to read part of the data files first. Take Spark as an example, the Spark DAG scheduler has to ask and schedule much more tasks than necessary, the majority of the tasks just scan the footer/metadata, prune data blocks by cuboid and other filter conditions, then quit. I'm not familiar with ORC, for Parquet's dictionary filtering, DictionaryPage is empty for high cardinality column, just range filtering(min/max) take place. Parquet didn't support bloomfilter, I remember ORC has built-in bloomfilter for every 10K rows. Parquet's PageIndex and IndexPage are not implemented yet. PageIndex should provide good performance since it can prune pages instead of the whole row group if it is implemented. IndexPage might be useless, as DictionaryPage, one column chunk just has one IndexPage, indexes like bitmap/b-tree can't fit into the size. If we want to extend, we will have to do something similar to OAP, they seperate the index files from data files. Another gap is the Cache(like Hbase Block Cache). HDFS data node has a form of memory caching, the operating system page cache. But OS page cache is out of our control. And there is no global information about the page cache state of each node, query engine is unable to schedule its tasks for cache-locality. We have to build cache on top of Parquet and ORC, and we can't use mmap, so there will be duplicate memory consuming, one copy in OS page cache and another copy in our own cache. If we leverage Spark as the runtime, our in-memory cache will compete the memory usage with Spark's storage memory and execution memory. We also need to modify Spark DAG scheduler to make it cache-aware and honor the cache-locality. Thanks Ken -----Original Message----- From: Luke Han <luke...@gmail.com> Sent: 2018年10月7日 19:44 To: dev <email@example.com> Subject: Re: [DISCUSS] Columnar storage engine for Apache Kylin It makes sense to bring a better storage option for Kylin. The option should be open and people could have different ways to create an adaptor for the underlying storage. Considering huge adoptions of Kylin today are all run on Hadoop/HDFS, I prefer for Parquet or ORC or other HDFS compatible option at this time. It will easy for people to upgrade to the next generation and keep consistency. Looking forward to this feature to be rolled out soon. Thanks. Best Regards! --------------------- Luke Han On Wed, Oct 3, 2018 at 2:37 PM Li Yang <liy...@apache.org> wrote: > Love this discussion. Like to highlight 3 major roles HBase is playing > currently, so we don't miss any of them when looking for a replacement. > > 1) Storage: A high speed big data storage > 2) Cache: A distributed storage cache layer (was BlockCache) > 3) MPP: A distributed computation framework (was Coprocessor) > > The "Storage" seems at the central of discussion. Be it Parquet, ORC, > or a new file format, to me the standard interface is most important. > As long as we have consensus on the access interface, like MapReduce / > Spark Dataset, then the rest of debate can be easily resolved by a > fair benchmark. Also it allows people with different preference to > keep their own implementation under the standard interface, and not impacting > the rest of Kylin. > > The "Cache" and the "MPP" were more or less overlooked. I suggest we > pay more attentions to them. Apart from Spark and Alluxio, any other > alternatives? Actually Druid is a well-rounded choice, as like HBase, > it covers all the 3 roles pretty well. > > In general, I prefer to choose from the state of the art instead of > re-inventing. Indeed, Kylin is not a storage project. A new storage > format is not Kylin's mission. Any storage innovations we come across > here would be more beneficial if contribute to Parquet or ORC community. > > Regards > Yang > > > > On Tue, Oct 2, 2018 at 11:20 AM ShaoFeng Shi <shaofeng...@apache.org> > wrote: > > > Hi Billy, > > > > Yes, the cloud storage should be considered. The traditional file > > layouts on HDFS may not work well on cloud storage. Kylin needs to > > allow > extension > > here. I will add this to the requirement. > > > > Billy Liu <billy...@apache.org> 于2018年9月29日周六 下午3:22写道： > > > > > Hi Shaofeng, > > > > > > I'd like to add one more character: cloud-native storage support. > > > Quite a few users are using S3 on AWS, or Azure Data Lake Storage > > > on Azure. If new storage engine could be more cloud friendly, more > > > user could get benefits from it. > > > > > > With Warm regards > > > > > > Billy Liu > > > ShaoFeng Shi <shaofeng...@apache.org> 于2018年9月28日周五 下午2:15写道： > > > > > > > > Hi Kylin developers. > > > > > > > > HBase has been Kylin’s storage engine since the first day; Kylin > > > > on > > HBase > > > > has been verified as a success which can support low latency & > > > > high concurrency queries on a very large data scale. Thanks to > > > > HBase, most > > > Kylin > > > > users can get on average less than 1-second query response. > > > > > > > > But we also see some limitations when putting Cubes into HBase; > > > > I > > shared > > > > some of them in the HBaseConf Asia 2018 this August. The > > > > typical limitations include: > > > > > > > > - Rowkey is the primary index, no secondary index so far; > > > > > > > > Filtering by row key’s prefix and suffix can get very different > > > performance > > > > result. So the user needs to do a good design about the row key; > > > otherwise, > > > > the query would be slow. This is difficult sometimes because the > > > > user > > > might > > > > not predict the filtering patterns ahead of cube design. > > > > > > > > - HBase is a key-value instead of a columnar storage > > > > > > > > Kylin combines multiple measures (columns) into fewer column > > > > families > > for > > > > smaller data size (row key size is remarkable). This causes > > > > HBase > often > > > > needing to read more data than requested. > > > > > > > > - HBase couldn't run on YARN > > > > > > > > This makes the deployment and auto-scaling a little complicated, > > > especially > > > > in the cloud. > > > > > > > > In one word, HBase is complicated to be Kylin’s storage. The > > maintenance, > > > > debugging is also hard for normal developers. Now we’re planning > > > > to > > seek > > > a > > > > simple, light-weighted, read-only storage engine for Kylin. The > > > > new solution should have the following characteristics: > > > > > > > > - Columnar layout with compression for efficient I/O; > > > > - Index by each column for quick filtering and seeking; > > > > - MapReduce / Spark API for parallel processing; > > > > - HDFS compliant for scalability and availability; > > > > - Mature, stable and extensible; > > > > > > > > With the plugin architecture introduced in Kylin 1.5, adding > > multiple > > > > storages to Kylin is possible. Some companies like Kyligence Inc > > > > and Meituan.com, have developed their customized storage engine > > > > for Kylin > > in > > > > their product or platform. In their experience, columnar storage > > > > is a > > > good > > > > supplement for the HBase engine. Kaisen Kang from Meituan.com > > > > has > > shared > > > > their KOD (Kylin on Druid) solution in this August’s Kylin > > > > meetup > in > > > > Beijing. > > > > > > > > We plan to do a PoC with Apache Parquet + Apache Spark in the > > > > next > > phase. > > > > Parquet is a standard columnar file format and has been widely > > supported > > > by > > > > many projects like Hive, Impala, Drill, etc. Parquet is adding > > > > the > page > > > > level column index to support fine-grained filtering. Apache > > > > Spark > can > > > > provide the parallel computing over Parquet and can be deployed > > > > on YARN/Mesos and Kubernetes. With this combination, the data > persistence > > > and > > > > computation are separated, which makes the scaling in/out much > > > > easier > > > than > > > > before. Benefiting from Spark's flexibility, we can not only > > > > push > down > > > more > > > > computation from Kylin to the Hadoop cluster. Except for > > > > Parquet, > > Apache > > > > ORC is also a candidate. > > > > > > > > Now I raise this discussion to get your ideas about Kylin’s > > > next-generation > > > > storage engine. If you have good ideas or any related data, > > > > welcome > > > discuss in > > > > the community. > > > > > > > > Thank you! > > > > > > > >  Apache Kylin on HBase > > > > > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.s > lideshare.net%2FShiShaoFeng1%2Fapache-kylin-on-hbase-extreme-olap-engi > ne-for-big-data&data=02%7C01%7Cmingmwang%40ebay.com%7C086eaf024bdc > 405ce7a508d62c4a414c%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C6367 > 45094772626562&sdata=OyLyOcIiqb%2Bm5ZiAzjQ920sDi%2FNHRREW3DN8ZJ71F > DE%3D&reserved=0 > > > >  Apache Kylin Plugin Architecture > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2 > > > > Fkylin.apache.org%2Fdevelopment%2Fplugin_arch.html&data=02%7 > > > > C01%7Cmingmwang%40ebay.com%7C086eaf024bdc405ce7a508d62c4a414c%7C > > > > 46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636745094772636571&am > > > > p;sdata=FoOI0R8ckLqCY%2FjtUhj2h5D35MNYTU3TPMmd4lH4j6A%3D&res > > > > erved=0 > > > >  基于Druid的Kylin存储引擎实践 > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fb > > > log.bcmeng.com%2Fpost%2Fkylin-on-druid.html--&data=02%7C01%7Cm > > > ingmwang%40ebay.com%7C086eaf024bdc405ce7a508d62c4a414c%7C46326bff9 > > > 92841a0baca17c16c94ea99%7C0%7C0%7C636745094772636571&sdata=IVk > > > IF2gv9SE5GxJfqbq4%2FPOAw23JgEU8xaAmWvGSdjQ%3D&reserved=0 > > > > Best regards, > > > > > > > > Shaofeng Shi 史少锋 > > > > > > > > > -- > > Best regards, > > > > Shaofeng Shi 史少锋 > > >