Yes, we will be filing a whole bunch of JIRAs. This release is not Done, so no point in arguing about whether it is perfect. Luke, I do not want you to push this release through.
On 2/1/16, 7:54 PM, "Luke Han" <luke...@gmail.com> wrote: >Hi Seshu, > "Done is better than Perfect" is one practice in our development: >release early, ask users >to try and test, then fix bugs, bring other features if any, and then >release a new one... >It works very well in the past and I believe it will continue benefit >further development. > > And you could see, the 2.x branch is active development code base >over several months, >as Yang mentioned, we are confident to release first version now. Also >there are already many >users in community are building package from 2.0 and reported many tickets >to help improve Kylin, >they are looking forward for the first release very much. With the Apache > release process, >the entire community will help to test and try with each release candidate >for sure there's >no critical issues, please also help log JIRA if any. > > Back to Spark Cubing, as previous discussed with Spark community, >there's >still one pending >JIRA for performance, so Spark Cubing already be excluded from the first >release. But with plug-able architecture, it could be very easy to >introduce back to coming version once the community happy for it. > >And, for Amazon EMR part, it's more about how to deploy rather than one >"feature", it not make >sense to set this as one criteria. > > Thanks to bring this discussion to help community:-) > >Luke > > >Best Regards! >--------------------- > >Luke Han > >On Tue, Feb 2, 2016 at 8:48 AM, Adunuthula, Seshu <sadunuth...@ebay.com> >wrote: > >> Yang, >> >> Implementing the old MR engine on the pluggable architecture does not >> prove that the architecture works. You need two points to draw a line. A >> single point does not prove that the architecture works. >> >> Improving the MR engine performance can be done on 1.0 code are without >> making it pluggable >> >> >> External talks and POCs are not the release criteria for a feature. >> >> Regards >> Seshu >> >> Sent from my iPhone >> >> > On Feb 1, 2016, at 6:01 PM, Li Yang <liy...@apache.org> wrote: >> > >> > Seshu's understanding of the 2.0 and its plugin-able architecture is >>very >> > wrong. Let me correct. :-) >> > >> > The plugin-able architecture is rock solid. Its first commit went >>back to >> > Jul 2015. On top it, we built MR cube engine V2 and storage engine V2, >> > which give much improved build and query performance. At the same >>time, >> the >> > old V1 engines are still available on 2.0 branch. The plugin-able >> > architecture allows coexistence of alternative engines. And user is >>free >> to >> > choose any of the engines that suits the need. >> > >> > In the last few month, thorough testing has been done on the 2.0-rc >> branch. >> > Like mentioned, we have rebuild hundreds of jobs on the V2 engines and >> > compare the results by running tens of thousands of queries against >>both >> V1 >> > and V2 cubes. The correctness is confirmed and performance >>improvement is >> > measured. The 2.0-rc branch is definitely the most well tested branch >>so >> > far. I am very confident of its quality. >> > >> > I believe Seshu also agrees with the improved performance and its >> quality, >> > as he proposed to release as v1.3. However he didn't know the improved >> > results are right on top of plugin-able architecture. >> > >> > So the saying plugin-able architecture is >> >> "POC quality features that should not be part of a release. We have >>not >> > built a single of these plugins that are production quality." >> > is very wrong. >> > >> > Streaming cubing is a less mature feature. It's in semi-production >> > quality. As shared in a few public talks, eBay has a SEO dashboard >>case >> > that leverages the streaming cubing feature and achieves 5 minutes >>data >> > latency. >> > >> > And I made the point very clear -- "Streaming cubing experimental >> support, >> > ... minutes interval" -- think no one will be confused. >> > >> > If more concerns about 2.0 quality, I suggest JIRA be opened and test >> case >> > be created. So we have evidence and can collaborate to improve. >> > >> > Still many thanks to the comments. Things become clearer through >>healthy >> > discussions. :-) >> > >> > Cheers >> > Yang >> > >> > On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuth...@ebay.com >> > <javascript:_e(%7B%7D,'cvml','sadunuth...@ebay.com');>> wrote: >> > >> >> A strong -1 on this. >> >> >> >> - A better MR cubing algorithm, about 1.5 times faster than 1.x by >> >> comparing hundreds of jobs. >> >> - TopN pre-calculation (more UDFs coming) >> >> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI >> >> >> >> >> >> >> >> These are incremental enhancements and does not warrant bumping up to >> 2.0 >> >> release. We should release them as in 1.3 >> >> >> >> >> >> - Streaming cubing experimental support, source from kafka, build >>cube >> >> in-mem at minutes interval >> >> - A plugin-able architecture, to allow alternative cube engine / >>storage >> >> engine / data source. >> >> >> >> >> >> >> >> These are POC quality features that should not be part of a release. >>We >> >> have not built a single of these plugins that are production quality. >> >> >> >> Luke/Yang I have told you multiple times not to push out a release >>when >> it >> >> is not ready. We nearly got down the entire HBase cluster in eBay >>with >> the >> >> bad design for the Streaming. If we scale this up to 100s of >>Streaming >> >> Cubes this design will render an HBase cluster unusable. >> >> >> >> I have spent substantial time looking into the release and it does >>not >> >> meet eBay¹s standards for a quality release. >> >> >> >> We will be doing the community a huge disservice by pushing this out >>by >> >> end of February. >> >> >> >> Regards >> >> Seshu Adunuthula >> >> >> >> >> >>> On 1/31/16, 11:46 PM, "Li Yang" <liy...@apache.org> wrote: >> >>> >> >>> Just to add more colors. >> >>> >> >>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few >>month. >> The >> >>> 2.0 rc1 contains: >> >>> >> >>> - A plugin-able architecture, to allow alternative cube engine / >> storage >> >>> engine / data source. >> >>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by >> >>> comparing hundreds of jobs. >> >>> - A better storage engine, makes query roughly 2 times faster >> (especially >> >>> for slow queries) than 1.x by comparing tens of thousands sqls. >> >>> - Streaming cubing experimental support, source from kafka, build >>cube >> >>> in-mem at minutes interval >> >>> - TopN pre-calculation (more UDFs coming) >> >>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI >> >>> - SAML authentication support >> >>> >> >>> As the release manager, I will kickoff the release process in two >>weeks >> >>> (once back from vacation). ETA by end of Feb. >> >>> >> >>> Would love to hear more feedback from our community. :-) >> >>> >> >>> >> >>> Yang >> >>> >> >>> >> >>> >> >>> On Monday, February 1, 2016, Adunuthula, Seshu >><sadunuth...@ebay.com> >> >>> wrote: >> >>> >> >>>> Hello Folks, >> >>>> >> >>>> We are actively working towards Apache Kylin 2.0 Release and would >> like >> >>>> a >> >>>> discussion with the community on what they would like to see in 2.0 >> >>>> release >> >>>> of the product. We have three big rock items we are working >>towards in >> >>>> 2.0 >> >>>> and lot of additional minor feature enhancements. >> >>>> >> >>>> Streaming Data Source support. >> >>>> This feature is semi baked in where the source of Kylin Cubes is >>Kafka >> >>>> Topics. Cube Segment are built on micro batches of messages >>arriving >> on >> >>>> Kafka topics. Currently a lot of work is going on to productize >>this >> >>>> feature. Primary areas of work are Stream Processing >> Engines/Frameworks >> >>>> to >> >>>> process the micro batches and UI to support out of the box >>integration >> >>>> of >> >>>> Kafka topics with Kylin Cubes. >> >>>> >> >>>> Spark based Cube building Engine. >> >>>> The initial performance numbers for a Spark based cubing engine did >> not >> >>>> show substantial improvement over MR based engine, but would like >>this >> >>>> feature to be baked in for the 2.0 Release. Lot of work underway to >> >>>> stabilize this feature. >> >>>> >> >>>> Amazon EMR Integration >> >>>> We had initial conversations with Amazon EMR to support Apache >>Kylin >> on >> >>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin >>will >> be >> >>>> enabled feature on Amazon EMR. Limited work has gone into this >>area, >> but >> >>>> this will be an important milestone for 2.0 >> >>>> >> >>>> We are also working towards creating an area for community driven >> >>>> improvements page similar to Apache Kafka¹s KIP >> >>>> >> >>>> >> >> >> >>https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo >> >>>> sals. >> >>>> Stay tuned. >> >>>> >> >>>> Regards >> >>>> Seshu Adunuthula >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >> >> >>