+1 I agree with Hyunsik. Sorry for late reply.
2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성: > Until today realized that my reply haven't been sent. > > +1 > > Totally agree with Hyunsik. 0.9 is more appropriate for the next release. > > Min > > > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote: > >> +1 >> >> I agree with Hyunsik as well. I think since 1.0 increments the major >> version number, it should be used for a particularly significant release. :) >> >> Thanks, >> David >> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote: >> >>> +1 Hyunsik. >>> >>> Thanks! >>> Warm Regards, >>> Alvin. >>> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: >>> >>>> Hi folks, >>>> >>>> I'd like to discuss the next version number. In Jira, we have >> provisionally >>>> used 1.0, and we didn't decide the next major version. I propose 0.9 as >> the >>>> next major version. What do you think about this? >>>> >>>> Regards, >>>> Hyunsik >>>> >>>> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> >> wrote: >>>> >>>>> Min, thanks for reminding us! >>>>> It's a mandatory issue. >>>>> We need to implement that feature ASAP. >>>>> >>>>> Thanks, >>>>> Jihoon >>>>> >>>>> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>> >>>>>> Min, >>>>>> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank >> you >>>>>> for reminding me. It would be achieved by modifying Query class to >>>>> execute >>>>>> independent execution blocks in parallel. I'll add it to the wiki. >>>>>> >>>>>> Thanks, >>>>>> Hyunsik >>>>>> >>>>>> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> >> wrote: >>>>>> >>>>>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan A >> at >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in >>>>>>> parallel, right? >>>>>>> >>>>>>> >>>>>>> Min >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>>> I've just updated the roadmap page. Please take a look at the >> section >>>>>>>> 'After 0.8.0' >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap >>>>>>>> >>>>>>>> If there are missed or additional ideas, feel free to add them on >>>>> that >>>>>>>> page or suggest them here. After we discuss them more, we would >>>>> decide >>>>>>>> their priorities. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Hyunsik >>>>>>>> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> >>>>>>> wrote: >>>>>>>>> Hi Hyoungjun, >>>>>>>>> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide >>>>>>>>> users with some prepared benchmark environment, users can test Tajo >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your >>>>>>>>> suggestion. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Hyunsik >>>>>>>>> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: >>>>>>>>>> Hi Hyunsik , >>>>>>>>>> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script >>>>> like >>>>>>> hive >>>>>>>>>> and impala is more helpful to test. >>>>>>>>>> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive >>>>>>>>>> https://github.com/cartershanklin/hive-testbench >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> Hyoungjun >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>>>>> >>>>>>>>>>> Hi Jihoon, >>>>>>>>>>> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed >>>>> it >>>>>>> on >>>>>>>> the >>>>>>>>>>> wiki. >>>>>>>>>>> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities >>>>> to >>>>>>>>>>> logical planning and distributed query planning. But, I'm not >>>>> sure >>>>>> it >>>>>>>>>>> can be included in short-term roadmap. They are necessary, but >>>>> they >>>>>>>>>>> are not required right now. In my view, it would be reasonable to >>>>>>>>>>> schedule them on long-term roadmap. >>>>>>>>>>> >>>>>>>>>>> Warm regards, >>>>>>>>>>> Hyunsik >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected] >>>>>> >>>>>>>> wrote: >>>>>>>>>>>> Hi Hyunsik, >>>>>>>>>>>> I'm very glad that we can release the next version, soon. >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap. >>>>>>>>>>>> >>>>>>>>>>>> Addition to the aforementioned features, I have the two >>>>>>> suggestions. >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I >>>>>>>> started it >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower >>>>>> priority >>>>>>>> than >>>>>>>>>>>> other stability issues. But, since this operator is widely used >>>>>> in >>>>>>>>>>> analytic >>>>>>>>>>>> applications, we need to add this feature as soon as possible. >>>>>> So, >>>>>>>> in my >>>>>>>>>>>> opinion, it would be good to add this feature to the next >>>>>> roadmap. >>>>>>>>>>>> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue >>>>>> for >>>>>>>>>>> making >>>>>>>>>>>> the query plan more flexible. After that, we can employ the >>>>>> plenty >>>>>>>>>>>> optimization opportunities like described in TAJO-161. >>>>>>>>>>>> >>>>>>>>>>>> How do you guys think about these issues? >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Jihoon >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also, >>>>> It's >>>>>> a >>>>>>>>>>> pleasure >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested >>>>> various >>>>>>>>>>> features >>>>>>>>>>>>> in various contexts, and tried to figure out if there are any >>>>>>>> critical >>>>>>>>>>>>> problems. I think that there are only a few issues and we can >>>>>>>> release >>>>>>>>>>> 0.8.0 >>>>>>>>>>>>> next week. If there are further issues to be solved before the >>>>>>> 0.8.0 >>>>>>>>>>>>> release, feel free to suggest ideas. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any >>>>>>>>>>> suggestion >>>>>>>>>>>>> from users, contributors, and committers. Please fire away! >>>>>>>>>>>>> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving the >>>>>> way >>>>>>>> Tajo >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number of >>>>>>>> concurrent >>>>>>>>>>>>> users. The key issues associated with this include the >>>>>> following: >>>>>>>>>>>>> >>>>>>>>>>>>> * High availability >>>>>>>>>>>>> * Multi-tenancy scheduling >>>>>>>>>>>>> * More stability >>>>>>>>>>>>> * Improved shuffle >>>>>>>>>>>>> >>>>>>>>>>>>> The current work status is as follows. Min is working on >>>>> Tajo's >>>>>>> new >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As >>>>> far >>>>>>> as I >>>>>>>>>>> know, >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys >>>>>>>> including >>>>>>>>>>>>> myself are investigating and solving the issues which occur in >>>>>>> large >>>>>>>>>>>>> clusters. These issues should be solved in order to make Tajo >>>>> a >>>>>>>> complete >>>>>>>>>>>>> enterprise-ready production. >>>>>>>>>>>>> >>>>>>>>>>>>> In addition, there are some SQL feature support issues. Many >>>>>>>> analytic >>>>>>>>>>>>> problems require window functions. Also, in-subquery and >>>>> scalar >>>>>>>> subquery >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with high >>>>>>>> priority. >>>>>>>>>>> In >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo >>>>>>> provides >>>>>>>>>>> these >>>>>>>>>>>>> features. >>>>>>>>>>>>> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema and >>>>> its >>>>>>>> related >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because >>>>> it >>>>>>>>>>> requires a >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the nested >>>>>>>> schema >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow. >>>>>>>>>>>>> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest >>>>>> that >>>>>>>> we >>>>>>>>>>> need >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So far, >>>>>> there >>>>>>>> has >>>>>>>>>>> been >>>>>>>>>>>>> a long period between each release because Tajo is undergoing >>>>>>> heavy >>>>>>>>>>>>> development. By 'releasing early, releasing often', we will >>>>> make >>>>>>>> more >>>>>>>>>>>>> tighter feedback loop between users and developers. >>>>>>>>>>>>> >>>>>>>>>>>>> I think that there are many additional many interesting issues >>>>>> to >>>>>>> be >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We >>>>> will >>>>>>>> arrange >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your >>>>>>>> suggestions. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you all so much for your contribution! >>>>>>>>>>>>> >>>>>>>>>>>>> Warm Regards, >>>>>>>>>>>>> Hyunsik >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop >>>>>>>>>> http://tajo.apache.org/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> My research interests are distributed systems, parallel computing and >>>>>>> bytecode based virtual machine. >>>>>>> >>>>>>> My profile: >>>>>>> http://www.linkedin.com/in/coderplay >>>>>>> My blog: >>>>>>> http://coderplay.javaeye.com >>>>>>> >>>>>> >>>>> >>> >> >> > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com
