Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Adunuthula, Seshu Mon, 01 Feb 2016 21:03:57 -0800

Yes, we will be filing a whole bunch of JIRAs. This release is not Done,
so no point in arguing about whether it is perfect. Luke, I do not want
you to push this release through.


 

On 2/1/16, 7:54 PM, "Luke Han" <luke...@gmail.com> wrote:

>Hi Seshu,
>      "Done is better than Perfect" is one practice in our development:
>release early, ask users
>to try and test, then fix bugs, bring other features if any, and then
>release a new one...
>It works very well in the past and I believe it will continue benefit
>further development.
>
>      And you could see, the 2.x branch is active development code base
>over several months,
>as Yang mentioned, we are confident to release first version now. Also
>there are already many
>users in community are building package from 2.0 and reported many tickets
>to help improve Kylin,
>they are looking forward for the first release very much. With the Apache
> release process,
>the entire community will help to test and try with each release candidate
>for sure there's
>no critical issues, please also help log JIRA if any.
>
>  Back to Spark Cubing, as previous discussed with Spark community,
>there's
>still one pending
>JIRA for performance, so Spark Cubing already be excluded from the first
>release. But with plug-able architecture, it could be very easy to
>introduce back to coming version once the community happy for it.
>
>And, for Amazon EMR part, it's more about how to deploy rather than one
>"feature", it not  make
>sense to set this as one criteria.
>
>        Thanks to bring this discussion to help community:-)
>
>Luke
>
>
>Best Regards!
>---------------------
>
>Luke Han
>
>On Tue, Feb 2, 2016 at 8:48 AM, Adunuthula, Seshu <sadunuth...@ebay.com>
>wrote:
>
>> Yang,
>>
>> Implementing the old MR engine on the pluggable architecture does not
>> prove that the architecture works. You need two points to draw a line. A
>> single point does not prove that the architecture works.
>>
>> Improving the MR engine performance can be done on 1.0 code are without
>> making it pluggable
>>
>>
>> External talks and POCs are not the release criteria for a feature.
>>
>> Regards
>> Seshu
>>
>> Sent from my iPhone
>>
>> > On Feb 1, 2016, at 6:01 PM, Li Yang <liy...@apache.org> wrote:
>> >
>> > Seshu's understanding of the 2.0 and its plugin-able architecture is
>>very
>> > wrong. Let me correct. :-)
>> >
>> > The plugin-able architecture is rock solid. Its first commit went
>>back to
>> > Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
>> > which give much improved build and query performance. At the same
>>time,
>> the
>> > old V1 engines are still available on 2.0 branch. The plugin-able
>> > architecture allows coexistence of alternative engines. And user is
>>free
>> to
>> > choose any of the engines that suits the need.
>> >
>> > In the last few month, thorough testing has been done on the 2.0-rc
>> branch.
>> > Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
>> > compare the results by running tens of thousands of queries against
>>both
>> V1
>> > and V2 cubes. The correctness is confirmed and performance
>>improvement is
>> > measured. The 2.0-rc branch is definitely the most well tested branch
>>so
>> > far. I am very confident of its quality.
>> >
>> > I believe Seshu also agrees with the improved performance and its
>> quality,
>> > as he proposed to release as v1.3. However he didn't know the improved
>> > results are right on top of plugin-able architecture.
>> >
>> > So the saying plugin-able architecture is
>> >> "POC quality features that should not be part of a release. We have
>>not
>> > built a single of these plugins that are production quality."
>> > is very wrong.
>> >
>> > Streaming cubing is a less mature feature. It's in semi-production
>> > quality.  As shared in a few public talks, eBay has a SEO dashboard
>>case
>> > that leverages the streaming cubing feature and achieves 5 minutes
>>data
>> > latency.
>> >
>> > And I made the point very clear -- "Streaming cubing experimental
>> support,
>> > ... minutes interval" -- think no one will be confused.
>> >
>> > If more concerns about 2.0 quality, I suggest JIRA be opened and test
>> case
>> > be created. So we have evidence and can collaborate to improve.
>> >
>> > Still many thanks to the comments. Things become clearer through
>>healthy
>> > discussions. :-)
>> >
>> > Cheers
>> > Yang
>> >
>> > On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuth...@ebay.com
>> > <javascript:_e(%7B%7D,'cvml','sadunuth...@ebay.com');>> wrote:
>> >
>> >> A strong -1 on this.
>> >>
>> >> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> >> comparing hundreds of jobs.
>> >> - TopN pre-calculation (more UDFs coming)
>> >> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> >>
>> >>
>> >>
>> >> These are incremental enhancements and does not warrant bumping up to
>> 2.0
>> >> release. We should release them as in 1.3
>> >>
>> >>
>> >> - Streaming cubing experimental support, source from kafka, build
>>cube
>> >> in-mem at minutes interval
>> >> - A plugin-able architecture, to allow alternative cube engine /
>>storage
>> >> engine / data source.
>> >>
>> >>
>> >>
>> >> These are POC quality features that should not be part of a release.
>>We
>> >> have not built a single of these plugins that are production quality.
>> >>
>> >> Luke/Yang I have told you multiple times not to push out a release
>>when
>> it
>> >> is not ready. We nearly got down the entire HBase cluster in eBay
>>with
>> the
>> >> bad design for the Streaming. If we scale this up to 100s of
>>Streaming
>> >> Cubes this design will render an HBase cluster unusable.
>> >>
>> >> I have spent substantial time looking into the release and it does
>>not
>> >> meet eBay¹s standards for a quality release.
>> >>
>> >> We will be doing the community a huge disservice by pushing this out
>>by
>> >> end of February.
>> >>
>> >> Regards
>> >> Seshu Adunuthula
>> >>
>> >>
>> >>> On 1/31/16, 11:46 PM, "Li Yang" <liy...@apache.org> wrote:
>> >>>
>> >>> Just  to add more colors.
>> >>>
>> >>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few
>>month.
>> The
>> >>> 2.0 rc1 contains:
>> >>>
>> >>> - A plugin-able architecture, to allow alternative cube engine /
>> storage
>> >>> engine / data source.
>> >>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> >>> comparing hundreds of jobs.
>> >>> - A better storage engine, makes query roughly 2 times faster
>> (especially
>> >>> for slow queries) than 1.x by comparing tens of thousands sqls.
>> >>> - Streaming cubing experimental support, source from kafka, build
>>cube
>> >>> in-mem at minutes interval
>> >>> - TopN pre-calculation (more UDFs coming)
>> >>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> >>> - SAML authentication support
>> >>>
>> >>> As the release manager, I will kickoff the release process in two
>>weeks
>> >>> (once back from vacation). ETA by end of Feb.
>> >>>
>> >>> Would love to hear more feedback from our community.  :-)
>> >>>
>> >>>
>> >>> Yang
>> >>>
>> >>>
>> >>>
>> >>> On Monday, February 1, 2016, Adunuthula, Seshu
>><sadunuth...@ebay.com>
>> >>> wrote:
>> >>>
>> >>>> Hello Folks,
>> >>>>
>> >>>> We are actively working towards Apache Kylin 2.0 Release and would
>> like
>> >>>> a
>> >>>> discussion with the community on what they would like to see in 2.0
>> >>>> release
>> >>>> of the product. We have three big rock items we are working
>>towards in
>> >>>> 2.0
>> >>>> and lot of additional minor feature enhancements.
>> >>>>
>> >>>> Streaming Data Source support.
>> >>>> This feature is semi baked in where the source of Kylin Cubes is
>>Kafka
>> >>>> Topics. Cube Segment are built on micro batches of messages
>>arriving
>> on
>> >>>> Kafka topics. Currently a lot of work is going on to productize
>>this
>> >>>> feature. Primary areas of work are Stream Processing
>> Engines/Frameworks
>> >>>> to
>> >>>> process the micro batches and UI to support out of the box
>>integration
>> >>>> of
>> >>>> Kafka topics with Kylin Cubes.
>> >>>>
>> >>>> Spark based Cube building Engine.
>> >>>> The initial performance numbers for a Spark based cubing engine did
>> not
>> >>>> show substantial improvement over MR based engine, but would like
>>this
>> >>>> feature to be baked in for the 2.0 Release. Lot of work underway to
>> >>>> stabilize this feature.
>> >>>>
>> >>>> Amazon EMR Integration
>> >>>> We had initial conversations with Amazon EMR to support Apache
>>Kylin
>> on
>> >>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin
>>will
>> be
>> >>>> enabled feature on Amazon EMR. Limited work has gone into this
>>area,
>> but
>> >>>> this will be an important milestone for 2.0
>> >>>>
>> >>>> We are also working towards creating an area for community driven
>> >>>> improvements page similar to Apache Kafka¹s KIP
>> >>>>
>> >>>>
>> >>
>> 
>>https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
>> >>>> sals.
>> >>>> Stay tuned.
>> >>>>
>> >>>> Regards
>> >>>> Seshu Adunuthula
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Reply via email to