Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

ShaoFeng Shi Fri, 24 Jul 2020 23:01:21 -0700

Hi Cinto,

Currently, it uses the native Parquet, no additional indexing; in the
future, if Parquet enhances its index, Kylin can benefit from it;


== "are we using any metastore (like Hive) along with this ?"
I'm not sure whether I understand properly. The Cube parquet files are
directly persisted on HDFS or object storage, with no dependency on the
Hive meta store.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Cinto Sunny <cinto.sunn...@gmail.com> 于2020年7月24日周五 下午10:47写道：

> Is there any documentation on the additional indexing (if any) we are
> doing on parquet. Also, are we using any metastore (like Hive) along with
> this ?
>
> - Cinto
>
>
> On Fri, Jul 24, 2020 at 4:23 AM ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
>> Hello, Kylin users,
>>
>> Regarding the Kylin Parquet storage, we hope to update the progress here.
>> At present, we have completed the main development work[1], design
>> document[2], and the benchmark. With the new architecture, Kylin is going
>> to be more efficient and be more cloud-friendly: fully on Spark, less
>> dependency on Hadoop stack, which made the DevOps easier.
>>
>> Here we discuss the future plan, which includes the two aspects.
>>
>> 1. The plan for Kylin 4.0
>>
>> In Kylin 3.x, we have released some important functions/features, such as
>> real-time analysis, Flink building engine, global dictionary with Hive,
>> etc. In the next phase, we hope to concentrate on the Parquet storage
>> engine and to release it in Kylin v4.0 within this year. In this period,
>> 3.x will be keeping maintained for bug fix and security vulnerability, but
>> won't introduce big change or major features.
>>
>> 2. Backward compatibility for HBase storage.
>>
>> When we develop the Parquet storage engine, we find it is very difficult
>> to make the Parquet and HBase engines co-exist. The codebase becomes very
>> complicated and ugly, inevitably bring big challenges to the maintenance
>> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>> CDHs' are different from the community's'), which makes the testing and
>> release effort be doubled or tripled in the past years.
>>
>> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>> metadata will also migrate to MySQL. For existing users, if you want to use
>> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>> Parquet storage, a migration tool can be provided later (another discuss
>> thread).
>>
>> Welcome to tell us your concerns and suggestions! Thank you for your
>> participation.
>>
>> ## Reference
>> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>> [2]
>> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Reply via email to