Great discussion everyone, sorry to have missed so much of it. I will certainly keep an eye on the YARN support angle and would love to help.
I am hoping now that my team is growing at work I will have time to dive back into my open source projects. I agree that YARN (and Mesos) support will be a huge plus. On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <[email protected]> wrote: > As David mentioned, the version 1.0 usually has special meanings like GA. > When we are confident with the stability and features of Tajo, we can use > 1.0. Thank you all guys again! > > > On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <[email protected]> wrote: > > > Thank you for votes! Let's go ahead! > > > > Cheers, > > Hyunsik > > > > > > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote: > > > >> +1 > >> > >> I agree with Hyunsik. > >> Sorry for late reply. > >> > >> 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성: > >> > >> > Until today realized that my reply haven't been sent. > >> > > >> > +1 > >> > > >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next > >> release. > >> > > >> > Min > >> > > >> > > >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> > >> wrote: > >> > > >> >> +1 > >> >> > >> >> I agree with Hyunsik as well. I think since 1.0 increments the major > >> >> version number, it should be used for a particularly significant > >> release. :) > >> >> > >> >> Thanks, > >> >> David > >> >> > >> >> > >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> > wrote: > >> >> > >> >>> +1 Hyunsik. > >> >>> > >> >>> Thanks! > >> >>> Warm Regards, > >> >>> Alvin. > >> >>> > >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > >> >>> > >> >>>> Hi folks, > >> >>>> > >> >>>> I'd like to discuss the next version number. In Jira, we have > >> >> provisionally > >> >>>> used 1.0, and we didn't decide the next major version. I propose > 0.9 > >> as > >> >> the > >> >>>> next major version. What do you think about this? > >> >>>> > >> >>>> Regards, > >> >>>> Hyunsik > >> >>>> > >> >>>> > >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected] > > > >> >> wrote: > >> >>>> > >> >>>>> Min, thanks for reminding us! > >> >>>>> It's a mandatory issue. > >> >>>>> We need to implement that feature ASAP. > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Jihoon > >> >>>>> > >> >>>>> > >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: > >> >>>>> > >> >>>>>> Min, > >> >>>>>> > >> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. > >> Thank > >> >> you > >> >>>>>> for reminding me. It would be achieved by modifying Query class > to > >> >>>>> execute > >> >>>>>> independent execution blocks in parallel. I'll add it to the > wiki. > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Hyunsik > >> >>>>>> > >> >>>>>> > >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> > >> >> wrote: > >> >>>>>> > >> >>>>>>> Yeah.. Another issue, seems a query like A join B. Tajo will > >> scan A > >> >> at > >> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it > in > >> >>>>>>> parallel, right? > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Min > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi < > [email protected] > >> > > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>>> I've just updated the roadmap page. Please take a look at the > >> >> section > >> >>>>>>>> 'After 0.8.0' > >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > >> >>>>>>>> > >> >>>>>>>> If there are missed or additional ideas, feel free to add them > on > >> >>>>> that > >> >>>>>>>> page or suggest them here. After we discuss them more, we would > >> >>>>> decide > >> >>>>>>>> their priorities. > >> >>>>>>>> > >> >>>>>>>> Best regards, > >> >>>>>>>> Hyunsik > >> >>>>>>>> > >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi < > >> [email protected]> > >> >>>>>>> wrote: > >> >>>>>>>>> Hi Hyoungjun, > >> >>>>>>>>> > >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we > >> provide > >> >>>>>>>>> users with some prepared benchmark environment, users can test > >> Tajo > >> >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your > >> >>>>>>>>> suggestion. > >> >>>>>>>>> > >> >>>>>>>>> Regards, > >> >>>>>>>>> Hyunsik > >> >>>>>>>>> > >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> > wrote: > >> >>>>>>>>>> Hi Hyunsik , > >> >>>>>>>>>> > >> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark > script > >> >>>>> like > >> >>>>>>> hive > >> >>>>>>>>>> and impala is more helpful to test. > >> >>>>>>>>>> > >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive > >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench > >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit > >> >>>>>>>>>> > >> >>>>>>>>>> Thanks! > >> >>>>>>>>>> Hyoungjun > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected] > >: > >> >>>>>>>>>> > >> >>>>>>>>>>> Hi Jihoon, > >> >>>>>>>>>>> > >> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I > >> filed > >> >>>>> it > >> >>>>>>> on > >> >>>>>>>> the > >> >>>>>>>>>>> wiki. > >> >>>>>>>>>>> > >> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization > >> opportunities > >> >>>>> to > >> >>>>>>>>>>> logical planning and distributed query planning. But, I'm > not > >> >>>>> sure > >> >>>>>> it > >> >>>>>>>>>>> can be included in short-term roadmap. They are necessary, > but > >> >>>>> they > >> >>>>>>>>>>> are not required right now. In my view, it would be > >> reasonable to > >> >>>>>>>>>>> schedule them on long-term roadmap. > >> >>>>>>>>>>> > >> >>>>>>>>>>> Warm regards, > >> >>>>>>>>>>> Hyunsik > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son < > >> [email protected] > >> >>>>>> > >> >>>>>>>> wrote: > >> >>>>>>>>>>>> Hi Hyunsik, > >> >>>>>>>>>>>> I'm very glad that we can release the next version, soon. > >> >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Addition to the aforementioned features, I have the two > >> >>>>>>> suggestions. > >> >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). > Acutally, I > >> >>>>>>>> started it > >> >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower > >> >>>>>> priority > >> >>>>>>>> than > >> >>>>>>>>>>>> other stability issues. But, since this operator is widely > >> used > >> >>>>>> in > >> >>>>>>>>>>> analytic > >> >>>>>>>>>>>> applications, we need to add this feature as soon as > >> possible. > >> >>>>>> So, > >> >>>>>>>> in my > >> >>>>>>>>>>>> opinion, it would be good to add this feature to the next > >> >>>>>> roadmap. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an > >> issue > >> >>>>>> for > >> >>>>>>>>>>> making > >> >>>>>>>>>>>> the query plan more flexible. After that, we can employ the > >> >>>>>> plenty > >> >>>>>>>>>>>> optimization opportunities like described in TAJO-161. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> How do you guys think about these issues? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Best Regards, > >> >>>>>>>>>>>> Jihoon > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi < > [email protected] > >> >: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>> Hi folks, > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also, > >> >>>>> It's > >> >>>>>> a > >> >>>>>>>>>>> pleasure > >> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested > >> >>>>> various > >> >>>>>>>>>>> features > >> >>>>>>>>>>>>> in various contexts, and tried to figure out if there are > >> any > >> >>>>>>>> critical > >> >>>>>>>>>>>>> problems. I think that there are only a few issues and we > >> can > >> >>>>>>>> release > >> >>>>>>>>>>> 0.8.0 > >> >>>>>>>>>>>>> next week. If there are further issues to be solved before > >> the > >> >>>>>>> 0.8.0 > >> >>>>>>>>>>>>> release, feel free to suggest ideas. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to > >> any > >> >>>>>>>>>>> suggestion > >> >>>>>>>>>>>>> from users, contributors, and committers. Please fire > away! > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving > >> the > >> >>>>>> way > >> >>>>>>>> Tajo > >> >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number > of > >> >>>>>>>> concurrent > >> >>>>>>>>>>>>> users. The key issues associated with this include the > >> >>>>>> following: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> * High availability > >> >>>>>>>>>>>>> * Multi-tenancy scheduling > >> >>>>>>>>>>>>> * More stability > >> >>>>>>>>>>>>> * Improved shuffle > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> The current work status is as follows. Min is working on > >> >>>>> Tajo's > >> >>>>>>> new > >> >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. > As > >> >>>>> far > >> >>>>>>> as I > >> >>>>>>>>>>> know, > >> >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some > >> guys > >> >>>>>>>> including > >> >>>>>>>>>>>>> myself are investigating and solving the issues which > occur > >> in > >> >>>>>>> large > >> >>>>>>>>>>>>> clusters. These issues should be solved in order to make > >> Tajo > >> >>>>> a > >> >>>>>>>> complete > >> >>>>>>>>>>>>> enterprise-ready production. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> In addition, there are some SQL feature support issues. > Many > >> >>>>>>>> analytic > >> >>>>>>>>>>>>> problems require window functions. Also, in-subquery and > >> >>>>> scalar > >> >>>>>>>> subquery > >> >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with > high > >> >>>>>>>> priority. > >> >>>>>>>>>>> In > >> >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo > >> >>>>>>> provides > >> >>>>>>>>>>> these > >> >>>>>>>>>>>>> features. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema > and > >> >>>>> its > >> >>>>>>>> related > >> >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while > >> because > >> >>>>> it > >> >>>>>>>>>>> requires a > >> >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the > >> nested > >> >>>>>>>> schema > >> >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to > >> suggest > >> >>>>>> that > >> >>>>>>>> we > >> >>>>>>>>>>> need > >> >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So > far, > >> >>>>>> there > >> >>>>>>>> has > >> >>>>>>>>>>> been > >> >>>>>>>>>>>>> a long period between each release because Tajo is > >> undergoing > >> >>>>>>> heavy > >> >>>>>>>>>>>>> development. By 'releasing early, releasing often', we > will > >> >>>>> make > >> >>>>>>>> more > >> >>>>>>>>>>>>> tighter feedback loop between users and developers. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> I think that there are many additional many interesting > >> issues > >> >>>>>> to > >> >>>>>>> be > >> >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. > We > >> >>>>> will > >> >>>>>>>> arrange > >> >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your > >> >>>>>>>> suggestions. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Thank you all so much for your contribution! > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Warm Regards, > >> >>>>>>>>>>>>> Hyunsik > >> >>>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> -- > >> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop > >> >>>>>>>>>> http://tajo.apache.org/ > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -- > >> >>>>>>> My research interests are distributed systems, parallel > computing > >> and > >> >>>>>>> bytecode based virtual machine. > >> >>>>>>> > >> >>>>>>> My profile: > >> >>>>>>> http://www.linkedin.com/in/coderplay > >> >>>>>>> My blog: > >> >>>>>>> http://coderplay.javaeye.com > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>> > >> >> > >> >> > >> > > >> > > >> > -- > >> > My research interests are distributed systems, parallel computing and > >> > bytecode based virtual machine. > >> > > >> > My profile: > >> > http://www.linkedin.com/in/coderplay > >> > My blog: > >> > http://coderplay.javaeye.com > >> > >> > > >
