Hi folks, I created a list of features for 0.9 release. I just putted on my desired features on the list. If you have your interesting issues, feel free to put them on the list. The roadmap would be just a recommendation. We can change them according to the situation. 0.9.0 is a major release. I'm expecting that we can release 0.9 after two months. Also, I welcome to any suggestions.
Warm regards, Hyunsik On Thu, Apr 24, 2014 at 3:20 PM, Hyunsik Choi <[email protected]> wrote: > Hi Eli, > > Thank you for comment. I'm also really hoping that you can have time > to contribute open source projects. Especially, since you are very > skilled in Yarn, your contribution would be great help to us :). > > Thanks, > Hyunsik > > On Sun, Apr 20, 2014 at 3:07 AM, Eli Reisman <[email protected]> wrote: >> Great discussion everyone, sorry to have missed so much of it. I will >> certainly keep an eye on the YARN support angle and would love to help. >> >> I am hoping now that my team is growing at work I will have time to dive >> back into my open source projects. I agree that YARN (and Mesos) support >> will be a huge plus. >> >> >> >> On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <[email protected]> wrote: >> >>> As David mentioned, the version 1.0 usually has special meanings like GA. >>> When we are confident with the stability and features of Tajo, we can use >>> 1.0. Thank you all guys again! >>> >>> >>> On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <[email protected]> wrote: >>> >>> > Thank you for votes! Let's go ahead! >>> > >>> > Cheers, >>> > Hyunsik >>> > >>> > >>> > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote: >>> > >>> >> +1 >>> >> >>> >> I agree with Hyunsik. >>> >> Sorry for late reply. >>> >> >>> >> 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성: >>> >> >>> >> > Until today realized that my reply haven't been sent. >>> >> > >>> >> > +1 >>> >> > >>> >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next >>> >> release. >>> >> > >>> >> > Min >>> >> > >>> >> > >>> >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> >>> >> wrote: >>> >> > >>> >> >> +1 >>> >> >> >>> >> >> I agree with Hyunsik as well. I think since 1.0 increments the major >>> >> >> version number, it should be used for a particularly significant >>> >> release. :) >>> >> >> >>> >> >> Thanks, >>> >> >> David >>> >> >> >>> >> >> >>> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> >>> wrote: >>> >> >> >>> >> >>> +1 Hyunsik. >>> >> >>> >>> >> >>> Thanks! >>> >> >>> Warm Regards, >>> >> >>> Alvin. >>> >> >>> >>> >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: >>> >> >>> >>> >> >>>> Hi folks, >>> >> >>>> >>> >> >>>> I'd like to discuss the next version number. In Jira, we have >>> >> >> provisionally >>> >> >>>> used 1.0, and we didn't decide the next major version. I propose >>> 0.9 >>> >> as >>> >> >> the >>> >> >>>> next major version. What do you think about this? >>> >> >>>> >>> >> >>>> Regards, >>> >> >>>> Hyunsik >>> >> >>>> >>> >> >>>> >>> >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected] >>> > >>> >> >> wrote: >>> >> >>>> >>> >> >>>>> Min, thanks for reminding us! >>> >> >>>>> It's a mandatory issue. >>> >> >>>>> We need to implement that feature ASAP. >>> >> >>>>> >>> >> >>>>> Thanks, >>> >> >>>>> Jihoon >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: >>> >> >>>>> >>> >> >>>>>> Min, >>> >> >>>>>> >>> >> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. >>> >> Thank >>> >> >> you >>> >> >>>>>> for reminding me. It would be achieved by modifying Query class >>> to >>> >> >>>>> execute >>> >> >>>>>> independent execution blocks in parallel. I'll add it to the >>> wiki. >>> >> >>>>>> >>> >> >>>>>> Thanks, >>> >> >>>>>> Hyunsik >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> >>> >> >> wrote: >>> >> >>>>>> >>> >> >>>>>>> Yeah.. Another issue, seems a query like A join B. Tajo will >>> >> scan A >>> >> >> at >>> >> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it >>> in >>> >> >>>>>>> parallel, right? >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> Min >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi < >>> [email protected] >>> >> > >>> >> >>>>>> wrote: >>> >> >>>>>>> >>> >> >>>>>>>> I've just updated the roadmap page. Please take a look at the >>> >> >> section >>> >> >>>>>>>> 'After 0.8.0' >>> >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap >>> >> >>>>>>>> >>> >> >>>>>>>> If there are missed or additional ideas, feel free to add them >>> on >>> >> >>>>> that >>> >> >>>>>>>> page or suggest them here. After we discuss them more, we would >>> >> >>>>> decide >>> >> >>>>>>>> their priorities. >>> >> >>>>>>>> >>> >> >>>>>>>> Best regards, >>> >> >>>>>>>> Hyunsik >>> >> >>>>>>>> >>> >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi < >>> >> [email protected]> >>> >> >>>>>>> wrote: >>> >> >>>>>>>>> Hi Hyoungjun, >>> >> >>>>>>>>> >>> >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we >>> >> provide >>> >> >>>>>>>>> users with some prepared benchmark environment, users can test >>> >> Tajo >>> >> >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your >>> >> >>>>>>>>> suggestion. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Regards, >>> >> >>>>>>>>> Hyunsik >>> >> >>>>>>>>> >>> >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> >>> wrote: >>> >> >>>>>>>>>> Hi Hyunsik , >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark >>> script >>> >> >>>>> like >>> >> >>>>>>> hive >>> >> >>>>>>>>>> and impala is more helpful to test. >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive >>> >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench >>> >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> Thanks! >>> >> >>>>>>>>>> Hyoungjun >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected] >>> >: >>> >> >>>>>>>>>> >>> >> >>>>>>>>>>> Hi Jihoon, >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I >>> >> filed >>> >> >>>>> it >>> >> >>>>>>> on >>> >> >>>>>>>> the >>> >> >>>>>>>>>>> wiki. >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization >>> >> opportunities >>> >> >>>>> to >>> >> >>>>>>>>>>> logical planning and distributed query planning. But, I'm >>> not >>> >> >>>>> sure >>> >> >>>>>> it >>> >> >>>>>>>>>>> can be included in short-term roadmap. They are necessary, >>> but >>> >> >>>>> they >>> >> >>>>>>>>>>> are not required right now. In my view, it would be >>> >> reasonable to >>> >> >>>>>>>>>>> schedule them on long-term roadmap. >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> Warm regards, >>> >> >>>>>>>>>>> Hyunsik >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son < >>> >> [email protected] >>> >> >>>>>> >>> >> >>>>>>>> wrote: >>> >> >>>>>>>>>>>> Hi Hyunsik, >>> >> >>>>>>>>>>>> I'm very glad that we can release the next version, soon. >>> >> >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap. >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> Addition to the aforementioned features, I have the two >>> >> >>>>>>> suggestions. >>> >> >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). >>> Acutally, I >>> >> >>>>>>>> started it >>> >> >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower >>> >> >>>>>> priority >>> >> >>>>>>>> than >>> >> >>>>>>>>>>>> other stability issues. But, since this operator is widely >>> >> used >>> >> >>>>>> in >>> >> >>>>>>>>>>> analytic >>> >> >>>>>>>>>>>> applications, we need to add this feature as soon as >>> >> possible. >>> >> >>>>>> So, >>> >> >>>>>>>> in my >>> >> >>>>>>>>>>>> opinion, it would be good to add this feature to the next >>> >> >>>>>> roadmap. >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an >>> >> issue >>> >> >>>>>> for >>> >> >>>>>>>>>>> making >>> >> >>>>>>>>>>>> the query plan more flexible. After that, we can employ the >>> >> >>>>>> plenty >>> >> >>>>>>>>>>>> optimization opportunities like described in TAJO-161. >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> How do you guys think about these issues? >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> Best Regards, >>> >> >>>>>>>>>>>> Jihoon >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi < >>> [email protected] >>> >> >: >>> >> >>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Hi folks, >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also, >>> >> >>>>> It's >>> >> >>>>>> a >>> >> >>>>>>>>>>> pleasure >>> >> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested >>> >> >>>>> various >>> >> >>>>>>>>>>> features >>> >> >>>>>>>>>>>>> in various contexts, and tried to figure out if there are >>> >> any >>> >> >>>>>>>> critical >>> >> >>>>>>>>>>>>> problems. I think that there are only a few issues and we >>> >> can >>> >> >>>>>>>> release >>> >> >>>>>>>>>>> 0.8.0 >>> >> >>>>>>>>>>>>> next week. If there are further issues to be solved before >>> >> the >>> >> >>>>>>> 0.8.0 >>> >> >>>>>>>>>>>>> release, feel free to suggest ideas. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to >>> >> any >>> >> >>>>>>>>>>> suggestion >>> >> >>>>>>>>>>>>> from users, contributors, and committers. Please fire >>> away! >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving >>> >> the >>> >> >>>>>> way >>> >> >>>>>>>> Tajo >>> >> >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number >>> of >>> >> >>>>>>>> concurrent >>> >> >>>>>>>>>>>>> users. The key issues associated with this include the >>> >> >>>>>> following: >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> * High availability >>> >> >>>>>>>>>>>>> * Multi-tenancy scheduling >>> >> >>>>>>>>>>>>> * More stability >>> >> >>>>>>>>>>>>> * Improved shuffle >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> The current work status is as follows. Min is working on >>> >> >>>>> Tajo's >>> >> >>>>>>> new >>> >> >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. >>> As >>> >> >>>>> far >>> >> >>>>>>> as I >>> >> >>>>>>>>>>> know, >>> >> >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some >>> >> guys >>> >> >>>>>>>> including >>> >> >>>>>>>>>>>>> myself are investigating and solving the issues which >>> occur >>> >> in >>> >> >>>>>>> large >>> >> >>>>>>>>>>>>> clusters. These issues should be solved in order to make >>> >> Tajo >>> >> >>>>> a >>> >> >>>>>>>> complete >>> >> >>>>>>>>>>>>> enterprise-ready production. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> In addition, there are some SQL feature support issues. >>> Many >>> >> >>>>>>>> analytic >>> >> >>>>>>>>>>>>> problems require window functions. Also, in-subquery and >>> >> >>>>> scalar >>> >> >>>>>>>> subquery >>> >> >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with >>> high >>> >> >>>>>>>> priority. >>> >> >>>>>>>>>>> In >>> >> >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo >>> >> >>>>>>> provides >>> >> >>>>>>>>>>> these >>> >> >>>>>>>>>>>>> features. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema >>> and >>> >> >>>>> its >>> >> >>>>>>>> related >>> >> >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while >>> >> because >>> >> >>>>> it >>> >> >>>>>>>>>>> requires a >>> >> >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the >>> >> nested >>> >> >>>>>>>> schema >>> >> >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to >>> >> suggest >>> >> >>>>>> that >>> >> >>>>>>>> we >>> >> >>>>>>>>>>> need >>> >> >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So >>> far, >>> >> >>>>>> there >>> >> >>>>>>>> has >>> >> >>>>>>>>>>> been >>> >> >>>>>>>>>>>>> a long period between each release because Tajo is >>> >> undergoing >>> >> >>>>>>> heavy >>> >> >>>>>>>>>>>>> development. By 'releasing early, releasing often', we >>> will >>> >> >>>>> make >>> >> >>>>>>>> more >>> >> >>>>>>>>>>>>> tighter feedback loop between users and developers. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> I think that there are many additional many interesting >>> >> issues >>> >> >>>>>> to >>> >> >>>>>>> be >>> >> >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. >>> We >>> >> >>>>> will >>> >> >>>>>>>> arrange >>> >> >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your >>> >> >>>>>>>> suggestions. >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Thank you all so much for your contribution! >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> Warm Regards, >>> >> >>>>>>>>>>>>> Hyunsik >>> >> >>>>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> -- >>> >> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop >>> >> >>>>>>>>>> http://tajo.apache.org/ >>> >> >>>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> -- >>> >> >>>>>>> My research interests are distributed systems, parallel >>> computing >>> >> and >>> >> >>>>>>> bytecode based virtual machine. >>> >> >>>>>>> >>> >> >>>>>>> My profile: >>> >> >>>>>>> http://www.linkedin.com/in/coderplay >>> >> >>>>>>> My blog: >>> >> >>>>>>> http://coderplay.javaeye.com >>> >> >>>>>>> >>> >> >>>>>> >>> >> >>>>> >>> >> >>> >>> >> >> >>> >> >> >>> >> > >>> >> > >>> >> > -- >>> >> > My research interests are distributed systems, parallel computing and >>> >> > bytecode based virtual machine. >>> >> > >>> >> > My profile: >>> >> > http://www.linkedin.com/in/coderplay >>> >> > My blog: >>> >> > http://coderplay.javaeye.com >>> >> >>> >> >>> > >>>
