Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Adunuthula, Seshu Mon, 01 Feb 2016 17:01:06 -0800

Yang,

Implementing the old MR engine on the pluggable architecture does not prove 
that the architecture works. You need two points to draw a line. A single point 
does not prove that the architecture works.


Improving the MR engine performance can be done on 1.0 code are without making 
it pluggable 


External talks and POCs are not the release criteria for a feature. 

Regards
Seshu

Sent from my iPhone

> On Feb 1, 2016, at 6:01 PM, Li Yang <liy...@apache.org> wrote:
> 
> Seshu's understanding of the 2.0 and its plugin-able architecture is very
> wrong. Let me correct. :-)
> 
> The plugin-able architecture is rock solid. Its first commit went back to
> Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
> which give much improved build and query performance. At the same time, the
> old V1 engines are still available on 2.0 branch. The plugin-able
> architecture allows coexistence of alternative engines. And user is free to
> choose any of the engines that suits the need.
> 
> In the last few month, thorough testing has been done on the 2.0-rc branch.
> Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
> compare the results by running tens of thousands of queries against both V1
> and V2 cubes. The correctness is confirmed and performance improvement is
> measured. The 2.0-rc branch is definitely the most well tested branch so
> far. I am very confident of its quality.
> 
> I believe Seshu also agrees with the improved performance and its quality,
> as he proposed to release as v1.3. However he didn't know the improved
> results are right on top of plugin-able architecture.
> 
> So the saying plugin-able architecture is
>> "POC quality features that should not be part of a release. We have not
> built a single of these plugins that are production quality."
> is very wrong.
> 
> Streaming cubing is a less mature feature. It's in semi-production
> quality.  As shared in a few public talks, eBay has a SEO dashboard case
> that leverages the streaming cubing feature and achieves 5 minutes data
> latency.
> 
> And I made the point very clear -- "Streaming cubing experimental support,
> ... minutes interval" -- think no one will be confused.
> 
> If more concerns about 2.0 quality, I suggest JIRA be opened and test case
> be created. So we have evidence and can collaborate to improve.
> 
> Still many thanks to the comments. Things become clearer through healthy
> discussions. :-)
> 
> Cheers
> Yang
> 
> On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuth...@ebay.com
> <javascript:_e(%7B%7D,'cvml','sadunuth...@ebay.com');>> wrote:
> 
>> A strong -1 on this.
>> 
>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> comparing hundreds of jobs.
>> - TopN pre-calculation (more UDFs coming)
>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> 
>> 
>> 
>> These are incremental enhancements and does not warrant bumping up to 2.0
>> release. We should release them as in 1.3
>> 
>> 
>> - Streaming cubing experimental support, source from kafka, build cube
>> in-mem at minutes interval
>> - A plugin-able architecture, to allow alternative cube engine / storage
>> engine / data source.
>> 
>> 
>> 
>> These are POC quality features that should not be part of a release. We
>> have not built a single of these plugins that are production quality.
>> 
>> Luke/Yang I have told you multiple times not to push out a release when it
>> is not ready. We nearly got down the entire HBase cluster in eBay with the
>> bad design for the Streaming. If we scale this up to 100s of Streaming
>> Cubes this design will render an HBase cluster unusable.
>> 
>> I have spent substantial time looking into the release and it does not
>> meet eBay¹s standards for a quality release.
>> 
>> We will be doing the community a huge disservice by pushing this out by
>> end of February.
>> 
>> Regards
>> Seshu Adunuthula
>> 
>> 
>>> On 1/31/16, 11:46 PM, "Li Yang" <liy...@apache.org> wrote:
>>> 
>>> Just  to add more colors.
>>> 
>>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month. The
>>> 2.0 rc1 contains:
>>> 
>>> - A plugin-able architecture, to allow alternative cube engine / storage
>>> engine / data source.
>>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>>> comparing hundreds of jobs.
>>> - A better storage engine, makes query roughly 2 times faster (especially
>>> for slow queries) than 1.x by comparing tens of thousands sqls.
>>> - Streaming cubing experimental support, source from kafka, build cube
>>> in-mem at minutes interval
>>> - TopN pre-calculation (more UDFs coming)
>>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>>> - SAML authentication support
>>> 
>>> As the release manager, I will kickoff the release process in two weeks
>>> (once back from vacation). ETA by end of Feb.
>>> 
>>> Would love to hear more feedback from our community.  :-)
>>> 
>>> 
>>> Yang
>>> 
>>> 
>>> 
>>> On Monday, February 1, 2016, Adunuthula, Seshu <sadunuth...@ebay.com>
>>> wrote:
>>> 
>>>> Hello Folks,
>>>> 
>>>> We are actively working towards Apache Kylin 2.0 Release and would like
>>>> a
>>>> discussion with the community on what they would like to see in 2.0
>>>> release
>>>> of the product. We have three big rock items we are working towards in
>>>> 2.0
>>>> and lot of additional minor feature enhancements.
>>>> 
>>>> Streaming Data Source support.
>>>> This feature is semi baked in where the source of Kylin Cubes is Kafka
>>>> Topics. Cube Segment are built on micro batches of messages arriving on
>>>> Kafka topics. Currently a lot of work is going on to productize this
>>>> feature. Primary areas of work are Stream Processing Engines/Frameworks
>>>> to
>>>> process the micro batches and UI to support out of the box integration
>>>> of
>>>> Kafka topics with Kylin Cubes.
>>>> 
>>>> Spark based Cube building Engine.
>>>> The initial performance numbers for a Spark based cubing engine did not
>>>> show substantial improvement over MR based engine, but would like this
>>>> feature to be baked in for the 2.0 Release. Lot of work underway to
>>>> stabilize this feature.
>>>> 
>>>> Amazon EMR Integration
>>>> We had initial conversations with Amazon EMR to support Apache Kylin on
>>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will be
>>>> enabled feature on Amazon EMR. Limited work has gone into this area, but
>>>> this will be an important milestone for 2.0
>>>> 
>>>> We are also working towards creating an area for community driven
>>>> improvements page similar to Apache Kafka¹s KIP
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
>>>> sals.
>>>> Stay tuned.
>>>> 
>>>> Regards
>>>> Seshu Adunuthula
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Reply via email to