Hi Eli, Thank you for comment. I'm also really hoping that you can have time to contribute open source projects. Especially, since you are very skilled in Yarn, your contribution would be great help to us :).
Thanks, Hyunsik On Sun, Apr 20, 2014 at 3:07 AM, Eli Reisman <[email protected]> wrote: > Great discussion everyone, sorry to have missed so much of it. I will > certainly keep an eye on the YARN support angle and would love to help. > > I am hoping now that my team is growing at work I will have time to dive > back into my open source projects. I agree that YARN (and Mesos) support > will be a huge plus. > > > > On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <[email protected]> wrote: > >> As David mentioned, the version 1.0 usually has special meanings like GA. >> When we are confident with the stability and features of Tajo, we can use >> 1.0. Thank you all guys again! >> >> >> On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <[email protected]> wrote: >> >> > Thank you for votes! Let's go ahead! >> > >> > Cheers, >> > Hyunsik >> > >> > >> > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote: >> > >> >> +1 >> >> >> >> I agree with Hyunsik. >> >> Sorry for late reply. >> >> >> >> 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성: >> >> >> >> > Until today realized that my reply haven't been sent. >> >> > >> >> > +1 >> >> > >> >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next >> >> release. >> >> > >> >> > Min >> >> > >> >> > >> >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> >> >> wrote: >> >> > >> >> >> +1 >> >> >> >> >> >> I agree with Hyunsik as well. I think since 1.0 increments the major >> >> >> version number, it should be used for a particularly significant >> >> release. :) >> >> >> >> >> >> Thanks, >> >> >> David >> >> >> >> >> >> >> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> >> wrote: >> >> >> >> >> >>> +1 Hyunsik. >> >> >>> >> >> >>> Thanks! >> >> >>> Warm Regards, >> >> >>> Alvin. >> >> >>> >> >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: >> >> >>> >> >> >>>> Hi folks, >> >> >>>> >> >> >>>> I'd like to discuss the next version number. In Jira, we have >> >> >> provisionally >> >> >>>> used 1.0, and we didn't decide the next major version. I propose >> 0.9 >> >> as >> >> >> the >> >> >>>> next major version. What do you think about this? >> >> >>>> >> >> >>>> Regards, >> >> >>>> Hyunsik >> >> >>>> >> >> >>>> >> >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected] >> > >> >> >> wrote: >> >> >>>> >> >> >>>>> Min, thanks for reminding us! >> >> >>>>> It's a mandatory issue. >> >> >>>>> We need to implement that feature ASAP. >> >> >>>>> >> >> >>>>> Thanks, >> >> >>>>> Jihoon >> >> >>>>> >> >> >>>>> >> >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: >> >> >>>>> >> >> >>>>>> Min, >> >> >>>>>> >> >> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. >> >> Thank >> >> >> you >> >> >>>>>> for reminding me. It would be achieved by modifying Query class >> to >> >> >>>>> execute >> >> >>>>>> independent execution blocks in parallel. I'll add it to the >> wiki. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Hyunsik >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> >> >> >> wrote: >> >> >>>>>> >> >> >>>>>>> Yeah.. Another issue, seems a query like A join B. Tajo will >> >> scan A >> >> >> at >> >> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it >> in >> >> >>>>>>> parallel, right? >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> Min >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi < >> [email protected] >> >> > >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>>> I've just updated the roadmap page. Please take a look at the >> >> >> section >> >> >>>>>>>> 'After 0.8.0' >> >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap >> >> >>>>>>>> >> >> >>>>>>>> If there are missed or additional ideas, feel free to add them >> on >> >> >>>>> that >> >> >>>>>>>> page or suggest them here. After we discuss them more, we would >> >> >>>>> decide >> >> >>>>>>>> their priorities. >> >> >>>>>>>> >> >> >>>>>>>> Best regards, >> >> >>>>>>>> Hyunsik >> >> >>>>>>>> >> >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi < >> >> [email protected]> >> >> >>>>>>> wrote: >> >> >>>>>>>>> Hi Hyoungjun, >> >> >>>>>>>>> >> >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we >> >> provide >> >> >>>>>>>>> users with some prepared benchmark environment, users can test >> >> Tajo >> >> >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your >> >> >>>>>>>>> suggestion. >> >> >>>>>>>>> >> >> >>>>>>>>> Regards, >> >> >>>>>>>>> Hyunsik >> >> >>>>>>>>> >> >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> >> wrote: >> >> >>>>>>>>>> Hi Hyunsik , >> >> >>>>>>>>>> >> >> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark >> script >> >> >>>>> like >> >> >>>>>>> hive >> >> >>>>>>>>>> and impala is more helpful to test. >> >> >>>>>>>>>> >> >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive >> >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench >> >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit >> >> >>>>>>>>>> >> >> >>>>>>>>>> Thanks! >> >> >>>>>>>>>> Hyoungjun >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected] >> >: >> >> >>>>>>>>>> >> >> >>>>>>>>>>> Hi Jihoon, >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I >> >> filed >> >> >>>>> it >> >> >>>>>>> on >> >> >>>>>>>> the >> >> >>>>>>>>>>> wiki. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization >> >> opportunities >> >> >>>>> to >> >> >>>>>>>>>>> logical planning and distributed query planning. But, I'm >> not >> >> >>>>> sure >> >> >>>>>> it >> >> >>>>>>>>>>> can be included in short-term roadmap. They are necessary, >> but >> >> >>>>> they >> >> >>>>>>>>>>> are not required right now. In my view, it would be >> >> reasonable to >> >> >>>>>>>>>>> schedule them on long-term roadmap. >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Warm regards, >> >> >>>>>>>>>>> Hyunsik >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son < >> >> [email protected] >> >> >>>>>> >> >> >>>>>>>> wrote: >> >> >>>>>>>>>>>> Hi Hyunsik, >> >> >>>>>>>>>>>> I'm very glad that we can release the next version, soon. >> >> >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap. >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> Addition to the aforementioned features, I have the two >> >> >>>>>>> suggestions. >> >> >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). >> Acutally, I >> >> >>>>>>>> started it >> >> >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower >> >> >>>>>> priority >> >> >>>>>>>> than >> >> >>>>>>>>>>>> other stability issues. But, since this operator is widely >> >> used >> >> >>>>>> in >> >> >>>>>>>>>>> analytic >> >> >>>>>>>>>>>> applications, we need to add this feature as soon as >> >> possible. >> >> >>>>>> So, >> >> >>>>>>>> in my >> >> >>>>>>>>>>>> opinion, it would be good to add this feature to the next >> >> >>>>>> roadmap. >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an >> >> issue >> >> >>>>>> for >> >> >>>>>>>>>>> making >> >> >>>>>>>>>>>> the query plan more flexible. After that, we can employ the >> >> >>>>>> plenty >> >> >>>>>>>>>>>> optimization opportunities like described in TAJO-161. >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> How do you guys think about these issues? >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> Best Regards, >> >> >>>>>>>>>>>> Jihoon >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi < >> [email protected] >> >> >: >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>>> Hi folks, >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also, >> >> >>>>> It's >> >> >>>>>> a >> >> >>>>>>>>>>> pleasure >> >> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested >> >> >>>>> various >> >> >>>>>>>>>>> features >> >> >>>>>>>>>>>>> in various contexts, and tried to figure out if there are >> >> any >> >> >>>>>>>> critical >> >> >>>>>>>>>>>>> problems. I think that there are only a few issues and we >> >> can >> >> >>>>>>>> release >> >> >>>>>>>>>>> 0.8.0 >> >> >>>>>>>>>>>>> next week. If there are further issues to be solved before >> >> the >> >> >>>>>>> 0.8.0 >> >> >>>>>>>>>>>>> release, feel free to suggest ideas. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to >> >> any >> >> >>>>>>>>>>> suggestion >> >> >>>>>>>>>>>>> from users, contributors, and committers. Please fire >> away! >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving >> >> the >> >> >>>>>> way >> >> >>>>>>>> Tajo >> >> >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number >> of >> >> >>>>>>>> concurrent >> >> >>>>>>>>>>>>> users. The key issues associated with this include the >> >> >>>>>> following: >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> * High availability >> >> >>>>>>>>>>>>> * Multi-tenancy scheduling >> >> >>>>>>>>>>>>> * More stability >> >> >>>>>>>>>>>>> * Improved shuffle >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> The current work status is as follows. Min is working on >> >> >>>>> Tajo's >> >> >>>>>>> new >> >> >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. >> As >> >> >>>>> far >> >> >>>>>>> as I >> >> >>>>>>>>>>> know, >> >> >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some >> >> guys >> >> >>>>>>>> including >> >> >>>>>>>>>>>>> myself are investigating and solving the issues which >> occur >> >> in >> >> >>>>>>> large >> >> >>>>>>>>>>>>> clusters. These issues should be solved in order to make >> >> Tajo >> >> >>>>> a >> >> >>>>>>>> complete >> >> >>>>>>>>>>>>> enterprise-ready production. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> In addition, there are some SQL feature support issues. >> Many >> >> >>>>>>>> analytic >> >> >>>>>>>>>>>>> problems require window functions. Also, in-subquery and >> >> >>>>> scalar >> >> >>>>>>>> subquery >> >> >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with >> high >> >> >>>>>>>> priority. >> >> >>>>>>>>>>> In >> >> >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo >> >> >>>>>>> provides >> >> >>>>>>>>>>> these >> >> >>>>>>>>>>>>> features. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema >> and >> >> >>>>> its >> >> >>>>>>>> related >> >> >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while >> >> because >> >> >>>>> it >> >> >>>>>>>>>>> requires a >> >> >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the >> >> nested >> >> >>>>>>>> schema >> >> >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to >> >> suggest >> >> >>>>>> that >> >> >>>>>>>> we >> >> >>>>>>>>>>> need >> >> >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So >> far, >> >> >>>>>> there >> >> >>>>>>>> has >> >> >>>>>>>>>>> been >> >> >>>>>>>>>>>>> a long period between each release because Tajo is >> >> undergoing >> >> >>>>>>> heavy >> >> >>>>>>>>>>>>> development. By 'releasing early, releasing often', we >> will >> >> >>>>> make >> >> >>>>>>>> more >> >> >>>>>>>>>>>>> tighter feedback loop between users and developers. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> I think that there are many additional many interesting >> >> issues >> >> >>>>>> to >> >> >>>>>>> be >> >> >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. >> We >> >> >>>>> will >> >> >>>>>>>> arrange >> >> >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your >> >> >>>>>>>> suggestions. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Thank you all so much for your contribution! >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Warm Regards, >> >> >>>>>>>>>>>>> Hyunsik >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> -- >> >> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop >> >> >>>>>>>>>> http://tajo.apache.org/ >> >> >>>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> -- >> >> >>>>>>> My research interests are distributed systems, parallel >> computing >> >> and >> >> >>>>>>> bytecode based virtual machine. >> >> >>>>>>> >> >> >>>>>>> My profile: >> >> >>>>>>> http://www.linkedin.com/in/coderplay >> >> >>>>>>> My blog: >> >> >>>>>>> http://coderplay.javaeye.com >> >> >>>>>>> >> >> >>>>>> >> >> >>>>> >> >> >>> >> >> >> >> >> >> >> >> > >> >> > >> >> > -- >> >> > My research interests are distributed systems, parallel computing and >> >> > bytecode based virtual machine. >> >> > >> >> > My profile: >> >> > http://www.linkedin.com/in/coderplay >> >> > My blog: >> >> > http://coderplay.javaeye.com >> >> >> >> >> > >>
