Until today realized that my reply haven't been sent. +1
Totally agree with Hyunsik. 0.9 is more appropriate for the next release. Min On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote: > +1 > > I agree with Hyunsik as well. I think since 1.0 increments the major > version number, it should be used for a particularly significant release. :) > > Thanks, > David > > > On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote: > > > +1 Hyunsik. > > > > Thanks! > > Warm Regards, > > Alvin. > > > > On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > > > >> Hi folks, > >> > >> I'd like to discuss the next version number. In Jira, we have > provisionally > >> used 1.0, and we didn't decide the next major version. I propose 0.9 as > the > >> next major version. What do you think about this? > >> > >> Regards, > >> Hyunsik > >> > >> > >> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> > wrote: > >> > >>> Min, thanks for reminding us! > >>> It's a mandatory issue. > >>> We need to implement that feature ASAP. > >>> > >>> Thanks, > >>> Jihoon > >>> > >>> > >>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: > >>> > >>>> Min, > >>>> > >>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank > you > >>>> for reminding me. It would be achieved by modifying Query class to > >>> execute > >>>> independent execution blocks in parallel. I'll add it to the wiki. > >>>> > >>>> Thanks, > >>>> Hyunsik > >>>> > >>>> > >>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> > wrote: > >>>> > >>>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan A > at > >>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in > >>>>> parallel, right? > >>>>> > >>>>> > >>>>> Min > >>>>> > >>>>> > >>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> > >>>> wrote: > >>>>> > >>>>>> I've just updated the roadmap page. Please take a look at the > section > >>>>>> 'After 0.8.0' > >>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > >>>>>> > >>>>>> If there are missed or additional ideas, feel free to add them on > >>> that > >>>>>> page or suggest them here. After we discuss them more, we would > >>> decide > >>>>>> their priorities. > >>>>>> > >>>>>> Best regards, > >>>>>> Hyunsik > >>>>>> > >>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> > >>>>> wrote: > >>>>>>> Hi Hyoungjun, > >>>>>>> > >>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide > >>>>>>> users with some prepared benchmark environment, users can test Tajo > >>>>>>> easily. I'll file your idea on the wiki. Thank you for your > >>>>>>> suggestion. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Hyunsik > >>>>>>> > >>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: > >>>>>>>> Hi Hyunsik , > >>>>>>>> > >>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script > >>> like > >>>>> hive > >>>>>>>> and impala is more helpful to test. > >>>>>>>> > >>>>>>>> https://github.com/rxin/TPC-H-Hive > >>>>>>>> https://github.com/cartershanklin/hive-testbench > >>>>>>>> https://github.com/cloudera/impala-tpcds-kit > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> Hyoungjun > >>>>>>>> > >>>>>>>> > >>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: > >>>>>>>> > >>>>>>>>> Hi Jihoon, > >>>>>>>>> > >>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed > >>> it > >>>>> on > >>>>>> the > >>>>>>>>> wiki. > >>>>>>>>> > >>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities > >>> to > >>>>>>>>> logical planning and distributed query planning. But, I'm not > >>> sure > >>>> it > >>>>>>>>> can be included in short-term roadmap. They are necessary, but > >>> they > >>>>>>>>> are not required right now. In my view, it would be reasonable to > >>>>>>>>> schedule them on long-term roadmap. > >>>>>>>>> > >>>>>>>>> Warm regards, > >>>>>>>>> Hyunsik > >>>>>>>>> > >>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected] > >>>> > >>>>>> wrote: > >>>>>>>>>> Hi Hyunsik, > >>>>>>>>>> I'm very glad that we can release the next version, soon. > >>>>>>>>>> Also, appreciate for the guideline of the next roadmap. > >>>>>>>>>> > >>>>>>>>>> Addition to the aforementioned features, I have the two > >>>>> suggestions. > >>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I > >>>>>> started it > >>>>>>>>>> quite a long time ago, but it is delayed due to the lower > >>>> priority > >>>>>> than > >>>>>>>>>> other stability issues. But, since this operator is widely used > >>>> in > >>>>>>>>> analytic > >>>>>>>>>> applications, we need to add this feature as soon as possible. > >>>> So, > >>>>>> in my > >>>>>>>>>> opinion, it would be good to add this feature to the next > >>>> roadmap. > >>>>>>>>>> > >>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue > >>>> for > >>>>>>>>> making > >>>>>>>>>> the query plan more flexible. After that, we can employ the > >>>> plenty > >>>>>>>>>> optimization opportunities like described in TAJO-161. > >>>>>>>>>> > >>>>>>>>>> How do you guys think about these issues? > >>>>>>>>>> > >>>>>>>>>> Best Regards, > >>>>>>>>>> Jihoon > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: > >>>>>>>>>> > >>>>>>>>>>> Hi folks, > >>>>>>>>>>> > >>>>>>>>>>> I'm very happy to see that our community is growing! Also, > >>> It's > >>>> a > >>>>>>>>> pleasure > >>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested > >>> various > >>>>>>>>> features > >>>>>>>>>>> in various contexts, and tried to figure out if there are any > >>>>>> critical > >>>>>>>>>>> problems. I think that there are only a few issues and we can > >>>>>> release > >>>>>>>>> 0.8.0 > >>>>>>>>>>> next week. If there are further issues to be solved before the > >>>>> 0.8.0 > >>>>>>>>>>> release, feel free to suggest ideas. > >>>>>>>>>>> > >>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any > >>>>>>>>> suggestion > >>>>>>>>>>> from users, contributors, and committers. Please fire away! > >>>>>>>>>>> > >>>>>>>>>>> I'm thinking that our next stage should focus on improving the > >>>> way > >>>>>> Tajo > >>>>>>>>>>> runs in thousands of large cluster nodes and for a number of > >>>>>> concurrent > >>>>>>>>>>> users. The key issues associated with this include the > >>>> following: > >>>>>>>>>>> > >>>>>>>>>>> * High availability > >>>>>>>>>>> * Multi-tenancy scheduling > >>>>>>>>>>> * More stability > >>>>>>>>>>> * Improved shuffle > >>>>>>>>>>> > >>>>>>>>>>> The current work status is as follows. Min is working on > >>> Tajo's > >>>>> new > >>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As > >>> far > >>>>> as I > >>>>>>>>> know, > >>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys > >>>>>> including > >>>>>>>>>>> myself are investigating and solving the issues which occur in > >>>>> large > >>>>>>>>>>> clusters. These issues should be solved in order to make Tajo > >>> a > >>>>>> complete > >>>>>>>>>>> enterprise-ready production. > >>>>>>>>>>> > >>>>>>>>>>> In addition, there are some SQL feature support issues. Many > >>>>>> analytic > >>>>>>>>>>> problems require window functions. Also, in-subquery and > >>> scalar > >>>>>> subquery > >>>>>>>>>>> should be supported. So, I'd like to schedule them with high > >>>>>> priority. > >>>>>>>>> In > >>>>>>>>>>> my view, there will be very few SQL support issues if Tajo > >>>>> provides > >>>>>>>>> these > >>>>>>>>>>> features. > >>>>>>>>>>> > >>>>>>>>>>> Besides those areas, David is working on a nested schema and > >>> its > >>>>>> related > >>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because > >>> it > >>>>>>>>> requires a > >>>>>>>>>>> lot of hard work. So, it would be great to schedule the nested > >>>>>> schema > >>>>>>>>>>> loosely. That's just my thoughts, anyhow. > >>>>>>>>>>> > >>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest > >>>> that > >>>>>> we > >>>>>>>>> need > >>>>>>>>>>> to release more frequently after the 0.8.0 release. So far, > >>>> there > >>>>>> has > >>>>>>>>> been > >>>>>>>>>>> a long period between each release because Tajo is undergoing > >>>>> heavy > >>>>>>>>>>> development. By 'releasing early, releasing often', we will > >>> make > >>>>>> more > >>>>>>>>>>> tighter feedback loop between users and developers. > >>>>>>>>>>> > >>>>>>>>>>> I think that there are many additional many interesting issues > >>>> to > >>>>> be > >>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We > >>> will > >>>>>> arrange > >>>>>>>>>>> our short-term roadmap and long-term roadmap based on your > >>>>>> suggestions. > >>>>>>>>>>> > >>>>>>>>>>> Thank you all so much for your contribution! > >>>>>>>>>>> > >>>>>>>>>>> Warm Regards, > >>>>>>>>>>> Hyunsik > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Tajo - Big Data Warehouse System on Hadoop > >>>>>>>> http://tajo.apache.org/ > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> My research interests are distributed systems, parallel computing and > >>>>> bytecode based virtual machine. > >>>>> > >>>>> My profile: > >>>>> http://www.linkedin.com/in/coderplay > >>>>> My blog: > >>>>> http://coderplay.javaeye.com > >>>>> > >>>> > >>> > > > > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
