+1 I agree with Hyunsik as well. I think since 1.0 increments the major version number, it should be used for a particularly significant release. :)
Thanks, David On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote: > +1 Hyunsik. > > Thanks! > Warm Regards, > Alvin. > > On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > >> Hi folks, >> >> I'd like to discuss the next version number. In Jira, we have provisionally >> used 1.0, and we didn't decide the next major version. I propose 0.9 as the >> next major version. What do you think about this? >> >> Regards, >> Hyunsik >> >> >> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> wrote: >> >>> Min, thanks for reminding us! >>> It's a mandatory issue. >>> We need to implement that feature ASAP. >>> >>> Thanks, >>> Jihoon >>> >>> >>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: >>> >>>> Min, >>>> >>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank you >>>> for reminding me. It would be achieved by modifying Query class to >>> execute >>>> independent execution blocks in parallel. I'll add it to the wiki. >>>> >>>> Thanks, >>>> Hyunsik >>>> >>>> >>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> wrote: >>>> >>>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan A at >>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in >>>>> parallel, right? >>>>> >>>>> >>>>> Min >>>>> >>>>> >>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> >>>> wrote: >>>>> >>>>>> I've just updated the roadmap page. Please take a look at the section >>>>>> 'After 0.8.0' >>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap >>>>>> >>>>>> If there are missed or additional ideas, feel free to add them on >>> that >>>>>> page or suggest them here. After we discuss them more, we would >>> decide >>>>>> their priorities. >>>>>> >>>>>> Best regards, >>>>>> Hyunsik >>>>>> >>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> >>>>> wrote: >>>>>>> Hi Hyoungjun, >>>>>>> >>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide >>>>>>> users with some prepared benchmark environment, users can test Tajo >>>>>>> easily. I'll file your idea on the wiki. Thank you for your >>>>>>> suggestion. >>>>>>> >>>>>>> Regards, >>>>>>> Hyunsik >>>>>>> >>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: >>>>>>>> Hi Hyunsik , >>>>>>>> >>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script >>> like >>>>> hive >>>>>>>> and impala is more helpful to test. >>>>>>>> >>>>>>>> https://github.com/rxin/TPC-H-Hive >>>>>>>> https://github.com/cartershanklin/hive-testbench >>>>>>>> https://github.com/cloudera/impala-tpcds-kit >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Hyoungjun >>>>>>>> >>>>>>>> >>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>>> >>>>>>>>> Hi Jihoon, >>>>>>>>> >>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed >>> it >>>>> on >>>>>> the >>>>>>>>> wiki. >>>>>>>>> >>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities >>> to >>>>>>>>> logical planning and distributed query planning. But, I'm not >>> sure >>>> it >>>>>>>>> can be included in short-term roadmap. They are necessary, but >>> they >>>>>>>>> are not required right now. In my view, it would be reasonable to >>>>>>>>> schedule them on long-term roadmap. >>>>>>>>> >>>>>>>>> Warm regards, >>>>>>>>> Hyunsik >>>>>>>>> >>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected] >>>> >>>>>> wrote: >>>>>>>>>> Hi Hyunsik, >>>>>>>>>> I'm very glad that we can release the next version, soon. >>>>>>>>>> Also, appreciate for the guideline of the next roadmap. >>>>>>>>>> >>>>>>>>>> Addition to the aforementioned features, I have the two >>>>> suggestions. >>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I >>>>>> started it >>>>>>>>>> quite a long time ago, but it is delayed due to the lower >>>> priority >>>>>> than >>>>>>>>>> other stability issues. But, since this operator is widely used >>>> in >>>>>>>>> analytic >>>>>>>>>> applications, we need to add this feature as soon as possible. >>>> So, >>>>>> in my >>>>>>>>>> opinion, it would be good to add this feature to the next >>>> roadmap. >>>>>>>>>> >>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue >>>> for >>>>>>>>> making >>>>>>>>>> the query plan more flexible. After that, we can employ the >>>> plenty >>>>>>>>>> optimization opportunities like described in TAJO-161. >>>>>>>>>> >>>>>>>>>> How do you guys think about these issues? >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jihoon >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>>>>> >>>>>>>>>>> Hi folks, >>>>>>>>>>> >>>>>>>>>>> I'm very happy to see that our community is growing! Also, >>> It's >>>> a >>>>>>>>> pleasure >>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested >>> various >>>>>>>>> features >>>>>>>>>>> in various contexts, and tried to figure out if there are any >>>>>> critical >>>>>>>>>>> problems. I think that there are only a few issues and we can >>>>>> release >>>>>>>>> 0.8.0 >>>>>>>>>>> next week. If there are further issues to be solved before the >>>>> 0.8.0 >>>>>>>>>>> release, feel free to suggest ideas. >>>>>>>>>>> >>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any >>>>>>>>> suggestion >>>>>>>>>>> from users, contributors, and committers. Please fire away! >>>>>>>>>>> >>>>>>>>>>> I'm thinking that our next stage should focus on improving the >>>> way >>>>>> Tajo >>>>>>>>>>> runs in thousands of large cluster nodes and for a number of >>>>>> concurrent >>>>>>>>>>> users. The key issues associated with this include the >>>> following: >>>>>>>>>>> >>>>>>>>>>> * High availability >>>>>>>>>>> * Multi-tenancy scheduling >>>>>>>>>>> * More stability >>>>>>>>>>> * Improved shuffle >>>>>>>>>>> >>>>>>>>>>> The current work status is as follows. Min is working on >>> Tajo's >>>>> new >>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As >>> far >>>>> as I >>>>>>>>> know, >>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys >>>>>> including >>>>>>>>>>> myself are investigating and solving the issues which occur in >>>>> large >>>>>>>>>>> clusters. These issues should be solved in order to make Tajo >>> a >>>>>> complete >>>>>>>>>>> enterprise-ready production. >>>>>>>>>>> >>>>>>>>>>> In addition, there are some SQL feature support issues. Many >>>>>> analytic >>>>>>>>>>> problems require window functions. Also, in-subquery and >>> scalar >>>>>> subquery >>>>>>>>>>> should be supported. So, I'd like to schedule them with high >>>>>> priority. >>>>>>>>> In >>>>>>>>>>> my view, there will be very few SQL support issues if Tajo >>>>> provides >>>>>>>>> these >>>>>>>>>>> features. >>>>>>>>>>> >>>>>>>>>>> Besides those areas, David is working on a nested schema and >>> its >>>>>> related >>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because >>> it >>>>>>>>> requires a >>>>>>>>>>> lot of hard work. So, it would be great to schedule the nested >>>>>> schema >>>>>>>>>>> loosely. That's just my thoughts, anyhow. >>>>>>>>>>> >>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest >>>> that >>>>>> we >>>>>>>>> need >>>>>>>>>>> to release more frequently after the 0.8.0 release. So far, >>>> there >>>>>> has >>>>>>>>> been >>>>>>>>>>> a long period between each release because Tajo is undergoing >>>>> heavy >>>>>>>>>>> development. By 'releasing early, releasing often', we will >>> make >>>>>> more >>>>>>>>>>> tighter feedback loop between users and developers. >>>>>>>>>>> >>>>>>>>>>> I think that there are many additional many interesting issues >>>> to >>>>> be >>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We >>> will >>>>>> arrange >>>>>>>>>>> our short-term roadmap and long-term roadmap based on your >>>>>> suggestions. >>>>>>>>>>> >>>>>>>>>>> Thank you all so much for your contribution! >>>>>>>>>>> >>>>>>>>>>> Warm Regards, >>>>>>>>>>> Hyunsik >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Tajo - Big Data Warehouse System on Hadoop >>>>>>>> http://tajo.apache.org/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> My research interests are distributed systems, parallel computing and >>>>> bytecode based virtual machine. >>>>> >>>>> My profile: >>>>> http://www.linkedin.com/in/coderplay >>>>> My blog: >>>>> http://coderplay.javaeye.com >>>>> >>>> >>> >
