Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Luke Han Mon, 01 Feb 2016 19:55:11 -0800

Hi Seshu,
      "Done is better than Perfect" is one practice in our development:
release early, ask users
to try and test, then fix bugs, bring other features if any, and then
release a new one...
It works very well in the past and I believe it will continue benefit
further development.


      And you could see, the 2.x branch is active development code base
over several months,
as Yang mentioned, we are confident to release first version now. Also
there are already many
users in community are building package from 2.0 and reported many tickets
to help improve Kylin,
they are looking forward for the first release very much. With the Apache
 release process,
the entire community will help to test and try with each release candidate
for sure there's
no critical issues, please also help log JIRA if any.

  Back to Spark Cubing, as previous discussed with Spark community, there's
still one pending
JIRA for performance, so Spark Cubing already be excluded from the first
release. But with plug-able architecture, it could be very easy to
introduce back to coming version once the community happy for it.

And, for Amazon EMR part, it's more about how to deploy rather than one
"feature", it not  make
sense to set this as one criteria.

        Thanks to bring this discussion to help community:-)

Luke


Best Regards!
---------------------

Luke Han

On Tue, Feb 2, 2016 at 8:48 AM, Adunuthula, Seshu <[email protected]>
wrote:

> Yang,
>
> Implementing the old MR engine on the pluggable architecture does not
> prove that the architecture works. You need two points to draw a line. A
> single point does not prove that the architecture works.
>
> Improving the MR engine performance can be done on 1.0 code are without
> making it pluggable
>
>
> External talks and POCs are not the release criteria for a feature.
>
> Regards
> Seshu
>
> Sent from my iPhone
>
> > On Feb 1, 2016, at 6:01 PM, Li Yang <[email protected]> wrote:
> >
> > Seshu's understanding of the 2.0 and its plugin-able architecture is very
> > wrong. Let me correct. :-)
> >
> > The plugin-able architecture is rock solid. Its first commit went back to
> > Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
> > which give much improved build and query performance. At the same time,
> the
> > old V1 engines are still available on 2.0 branch. The plugin-able
> > architecture allows coexistence of alternative engines. And user is free
> to
> > choose any of the engines that suits the need.
> >
> > In the last few month, thorough testing has been done on the 2.0-rc
> branch.
> > Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
> > compare the results by running tens of thousands of queries against both
> V1
> > and V2 cubes. The correctness is confirmed and performance improvement is
> > measured. The 2.0-rc branch is definitely the most well tested branch so
> > far. I am very confident of its quality.
> >
> > I believe Seshu also agrees with the improved performance and its
> quality,
> > as he proposed to release as v1.3. However he didn't know the improved
> > results are right on top of plugin-able architecture.
> >
> > So the saying plugin-able architecture is
> >> "POC quality features that should not be part of a release. We have not
> > built a single of these plugins that are production quality."
> > is very wrong.
> >
> > Streaming cubing is a less mature feature. It's in semi-production
> > quality.  As shared in a few public talks, eBay has a SEO dashboard case
> > that leverages the streaming cubing feature and achieves 5 minutes data
> > latency.
> >
> > And I made the point very clear -- "Streaming cubing experimental
> support,
> > ... minutes interval" -- think no one will be confused.
> >
> > If more concerns about 2.0 quality, I suggest JIRA be opened and test
> case
> > be created. So we have evidence and can collaborate to improve.
> >
> > Still many thanks to the comments. Things become clearer through healthy
> > discussions. :-)
> >
> > Cheers
> > Yang
> >
> > On Tuesday, February 2, 2016, Adunuthula, Seshu <[email protected]
> > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
> >
> >> A strong -1 on this.
> >>
> >> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
> >> comparing hundreds of jobs.
> >> - TopN pre-calculation (more UDFs coming)
> >> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
> >>
> >>
> >>
> >> These are incremental enhancements and does not warrant bumping up to
> 2.0
> >> release. We should release them as in 1.3
> >>
> >>
> >> - Streaming cubing experimental support, source from kafka, build cube
> >> in-mem at minutes interval
> >> - A plugin-able architecture, to allow alternative cube engine / storage
> >> engine / data source.
> >>
> >>
> >>
> >> These are POC quality features that should not be part of a release. We
> >> have not built a single of these plugins that are production quality.
> >>
> >> Luke/Yang I have told you multiple times not to push out a release when
> it
> >> is not ready. We nearly got down the entire HBase cluster in eBay with
> the
> >> bad design for the Streaming. If we scale this up to 100s of Streaming
> >> Cubes this design will render an HBase cluster unusable.
> >>
> >> I have spent substantial time looking into the release and it does not
> >> meet eBay¹s standards for a quality release.
> >>
> >> We will be doing the community a huge disservice by pushing this out by
> >> end of February.
> >>
> >> Regards
> >> Seshu Adunuthula
> >>
> >>
> >>> On 1/31/16, 11:46 PM, "Li Yang" <[email protected]> wrote:
> >>>
> >>> Just  to add more colors.
> >>>
> >>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month.
> The
> >>> 2.0 rc1 contains:
> >>>
> >>> - A plugin-able architecture, to allow alternative cube engine /
> storage
> >>> engine / data source.
> >>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
> >>> comparing hundreds of jobs.
> >>> - A better storage engine, makes query roughly 2 times faster
> (especially
> >>> for slow queries) than 1.x by comparing tens of thousands sqls.
> >>> - Streaming cubing experimental support, source from kafka, build cube
> >>> in-mem at minutes interval
> >>> - TopN pre-calculation (more UDFs coming)
> >>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
> >>> - SAML authentication support
> >>>
> >>> As the release manager, I will kickoff the release process in two weeks
> >>> (once back from vacation). ETA by end of Feb.
> >>>
> >>> Would love to hear more feedback from our community.  :-)
> >>>
> >>>
> >>> Yang
> >>>
> >>>
> >>>
> >>> On Monday, February 1, 2016, Adunuthula, Seshu <[email protected]>
> >>> wrote:
> >>>
> >>>> Hello Folks,
> >>>>
> >>>> We are actively working towards Apache Kylin 2.0 Release and would
> like
> >>>> a
> >>>> discussion with the community on what they would like to see in 2.0
> >>>> release
> >>>> of the product. We have three big rock items we are working towards in
> >>>> 2.0
> >>>> and lot of additional minor feature enhancements.
> >>>>
> >>>> Streaming Data Source support.
> >>>> This feature is semi baked in where the source of Kylin Cubes is Kafka
> >>>> Topics. Cube Segment are built on micro batches of messages arriving
> on
> >>>> Kafka topics. Currently a lot of work is going on to productize this
> >>>> feature. Primary areas of work are Stream Processing
> Engines/Frameworks
> >>>> to
> >>>> process the micro batches and UI to support out of the box integration
> >>>> of
> >>>> Kafka topics with Kylin Cubes.
> >>>>
> >>>> Spark based Cube building Engine.
> >>>> The initial performance numbers for a Spark based cubing engine did
> not
> >>>> show substantial improvement over MR based engine, but would like this
> >>>> feature to be baked in for the 2.0 Release. Lot of work underway to
> >>>> stabilize this feature.
> >>>>
> >>>> Amazon EMR Integration
> >>>> We had initial conversations with Amazon EMR to support Apache Kylin
> on
> >>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will
> be
> >>>> enabled feature on Amazon EMR. Limited work has gone into this area,
> but
> >>>> this will be an important milestone for 2.0
> >>>>
> >>>> We are also working towards creating an area for community driven
> >>>> improvements page similar to Apache Kafka¹s KIP
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
> >>>> sals.
> >>>> Stay tuned.
> >>>>
> >>>> Regards
> >>>> Seshu Adunuthula
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Reply via email to