Thank you for votes! Let's go ahead! Cheers, Hyunsik
On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote: > +1 > > I agree with Hyunsik. > Sorry for late reply. > > 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성: > > > Until today realized that my reply haven't been sent. > > > > +1 > > > > Totally agree with Hyunsik. 0.9 is more appropriate for the next release. > > > > Min > > > > > > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote: > > > >> +1 > >> > >> I agree with Hyunsik as well. I think since 1.0 increments the major > >> version number, it should be used for a particularly significant > release. :) > >> > >> Thanks, > >> David > >> > >> > >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote: > >> > >>> +1 Hyunsik. > >>> > >>> Thanks! > >>> Warm Regards, > >>> Alvin. > >>> > >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > >>> > >>>> Hi folks, > >>>> > >>>> I'd like to discuss the next version number. In Jira, we have > >> provisionally > >>>> used 1.0, and we didn't decide the next major version. I propose 0.9 > as > >> the > >>>> next major version. What do you think about this? > >>>> > >>>> Regards, > >>>> Hyunsik > >>>> > >>>> > >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> > >> wrote: > >>>> > >>>>> Min, thanks for reminding us! > >>>>> It's a mandatory issue. > >>>>> We need to implement that feature ASAP. > >>>>> > >>>>> Thanks, > >>>>> Jihoon > >>>>> > >>>>> > >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: > >>>>> > >>>>>> Min, > >>>>>> > >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank > >> you > >>>>>> for reminding me. It would be achieved by modifying Query class to > >>>>> execute > >>>>>> independent execution blocks in parallel. I'll add it to the wiki. > >>>>>> > >>>>>> Thanks, > >>>>>> Hyunsik > >>>>>> > >>>>>> > >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> > >> wrote: > >>>>>> > >>>>>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan > A > >> at > >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in > >>>>>>> parallel, right? > >>>>>>> > >>>>>>> > >>>>>>> Min > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> > >>>>>> wrote: > >>>>>>> > >>>>>>>> I've just updated the roadmap page. Please take a look at the > >> section > >>>>>>>> 'After 0.8.0' > >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > >>>>>>>> > >>>>>>>> If there are missed or additional ideas, feel free to add them on > >>>>> that > >>>>>>>> page or suggest them here. After we discuss them more, we would > >>>>> decide > >>>>>>>> their priorities. > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> Hyunsik > >>>>>>>> > >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected] > > > >>>>>>> wrote: > >>>>>>>>> Hi Hyoungjun, > >>>>>>>>> > >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we > provide > >>>>>>>>> users with some prepared benchmark environment, users can test > Tajo > >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your > >>>>>>>>> suggestion. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Hyunsik > >>>>>>>>> > >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: > >>>>>>>>>> Hi Hyunsik , > >>>>>>>>>> > >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script > >>>>> like > >>>>>>> hive > >>>>>>>>>> and impala is more helpful to test. > >>>>>>>>>> > >>>>>>>>>> https://github.com/rxin/TPC-H-Hive > >>>>>>>>>> https://github.com/cartershanklin/hive-testbench > >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit > >>>>>>>>>> > >>>>>>>>>> Thanks! > >>>>>>>>>> Hyoungjun > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: > >>>>>>>>>> > >>>>>>>>>>> Hi Jihoon, > >>>>>>>>>>> > >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I > filed > >>>>> it > >>>>>>> on > >>>>>>>> the > >>>>>>>>>>> wiki. > >>>>>>>>>>> > >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities > >>>>> to > >>>>>>>>>>> logical planning and distributed query planning. But, I'm not > >>>>> sure > >>>>>> it > >>>>>>>>>>> can be included in short-term roadmap. They are necessary, but > >>>>> they > >>>>>>>>>>> are not required right now. In my view, it would be reasonable > to > >>>>>>>>>>> schedule them on long-term roadmap. > >>>>>>>>>>> > >>>>>>>>>>> Warm regards, > >>>>>>>>>>> Hyunsik > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son < > [email protected] > >>>>>> > >>>>>>>> wrote: > >>>>>>>>>>>> Hi Hyunsik, > >>>>>>>>>>>> I'm very glad that we can release the next version, soon. > >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap. > >>>>>>>>>>>> > >>>>>>>>>>>> Addition to the aforementioned features, I have the two > >>>>>>> suggestions. > >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I > >>>>>>>> started it > >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower > >>>>>> priority > >>>>>>>> than > >>>>>>>>>>>> other stability issues. But, since this operator is widely > used > >>>>>> in > >>>>>>>>>>> analytic > >>>>>>>>>>>> applications, we need to add this feature as soon as possible. > >>>>>> So, > >>>>>>>> in my > >>>>>>>>>>>> opinion, it would be good to add this feature to the next > >>>>>> roadmap. > >>>>>>>>>>>> > >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an > issue > >>>>>> for > >>>>>>>>>>> making > >>>>>>>>>>>> the query plan more flexible. After that, we can employ the > >>>>>> plenty > >>>>>>>>>>>> optimization opportunities like described in TAJO-161. > >>>>>>>>>>>> > >>>>>>>>>>>> How do you guys think about these issues? > >>>>>>>>>>>> > >>>>>>>>>>>> Best Regards, > >>>>>>>>>>>> Jihoon > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi folks, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also, > >>>>> It's > >>>>>> a > >>>>>>>>>>> pleasure > >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested > >>>>> various > >>>>>>>>>>> features > >>>>>>>>>>>>> in various contexts, and tried to figure out if there are any > >>>>>>>> critical > >>>>>>>>>>>>> problems. I think that there are only a few issues and we can > >>>>>>>> release > >>>>>>>>>>> 0.8.0 > >>>>>>>>>>>>> next week. If there are further issues to be solved before > the > >>>>>>> 0.8.0 > >>>>>>>>>>>>> release, feel free to suggest ideas. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to > any > >>>>>>>>>>> suggestion > >>>>>>>>>>>>> from users, contributors, and committers. Please fire away! > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving > the > >>>>>> way > >>>>>>>> Tajo > >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number of > >>>>>>>> concurrent > >>>>>>>>>>>>> users. The key issues associated with this include the > >>>>>> following: > >>>>>>>>>>>>> > >>>>>>>>>>>>> * High availability > >>>>>>>>>>>>> * Multi-tenancy scheduling > >>>>>>>>>>>>> * More stability > >>>>>>>>>>>>> * Improved shuffle > >>>>>>>>>>>>> > >>>>>>>>>>>>> The current work status is as follows. Min is working on > >>>>> Tajo's > >>>>>>> new > >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As > >>>>> far > >>>>>>> as I > >>>>>>>>>>> know, > >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys > >>>>>>>> including > >>>>>>>>>>>>> myself are investigating and solving the issues which occur > in > >>>>>>> large > >>>>>>>>>>>>> clusters. These issues should be solved in order to make Tajo > >>>>> a > >>>>>>>> complete > >>>>>>>>>>>>> enterprise-ready production. > >>>>>>>>>>>>> > >>>>>>>>>>>>> In addition, there are some SQL feature support issues. Many > >>>>>>>> analytic > >>>>>>>>>>>>> problems require window functions. Also, in-subquery and > >>>>> scalar > >>>>>>>> subquery > >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with high > >>>>>>>> priority. > >>>>>>>>>>> In > >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo > >>>>>>> provides > >>>>>>>>>>> these > >>>>>>>>>>>>> features. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Besides those areas, David is working on a nested schema and > >>>>> its > >>>>>>>> related > >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because > >>>>> it > >>>>>>>>>>> requires a > >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the > nested > >>>>>>>> schema > >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest > >>>>>> that > >>>>>>>> we > >>>>>>>>>>> need > >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So far, > >>>>>> there > >>>>>>>> has > >>>>>>>>>>> been > >>>>>>>>>>>>> a long period between each release because Tajo is undergoing > >>>>>>> heavy > >>>>>>>>>>>>> development. By 'releasing early, releasing often', we will > >>>>> make > >>>>>>>> more > >>>>>>>>>>>>> tighter feedback loop between users and developers. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think that there are many additional many interesting > issues > >>>>>> to > >>>>>>> be > >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We > >>>>> will > >>>>>>>> arrange > >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your > >>>>>>>> suggestions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thank you all so much for your contribution! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Warm Regards, > >>>>>>>>>>>>> Hyunsik > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop > >>>>>>>>>> http://tajo.apache.org/ > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> My research interests are distributed systems, parallel computing > and > >>>>>>> bytecode based virtual machine. > >>>>>>> > >>>>>>> My profile: > >>>>>>> http://www.linkedin.com/in/coderplay > >>>>>>> My blog: > >>>>>>> http://coderplay.javaeye.com > >>>>>>> > >>>>>> > >>>>> > >>> > >> > >> > > > > > > -- > > My research interests are distributed systems, parallel computing and > > bytecode based virtual machine. > > > > My profile: > > http://www.linkedin.com/in/coderplay > > My blog: > > http://coderplay.javaeye.com > >
