+1 Hyunsik. Thanks! Warm Regards, Alvin.
On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote: > Hi folks, > > I'd like to discuss the next version number. In Jira, we have provisionally > used 1.0, and we didn't decide the next major version. I propose 0.9 as the > next major version. What do you think about this? > > Regards, > Hyunsik > > > On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> wrote: > >> Min, thanks for reminding us! >> It's a mandatory issue. >> We need to implement that feature ASAP. >> >> Thanks, >> Jihoon >> >> >> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: >> >>> Min, >>> >>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank you >>> for reminding me. It would be achieved by modifying Query class to >> execute >>> independent execution blocks in parallel. I'll add it to the wiki. >>> >>> Thanks, >>> Hyunsik >>> >>> >>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> wrote: >>> >>>> Yeah.. Another issue, seems a query like A join B. Tajo will scan A at >>>> first stage, after that in the 2nd stage scan B. Doesn't run it in >>>> parallel, right? >>>> >>>> >>>> Min >>>> >>>> >>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> >>> wrote: >>>> >>>>> I've just updated the roadmap page. Please take a look at the section >>>>> 'After 0.8.0' >>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap >>>>> >>>>> If there are missed or additional ideas, feel free to add them on >> that >>>>> page or suggest them here. After we discuss them more, we would >> decide >>>>> their priorities. >>>>> >>>>> Best regards, >>>>> Hyunsik >>>>> >>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> >>>> wrote: >>>>>> Hi Hyoungjun, >>>>>> >>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide >>>>>> users with some prepared benchmark environment, users can test Tajo >>>>>> easily. I'll file your idea on the wiki. Thank you for your >>>>>> suggestion. >>>>>> >>>>>> Regards, >>>>>> Hyunsik >>>>>> >>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: >>>>>>> Hi Hyunsik , >>>>>>> >>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script >> like >>>> hive >>>>>>> and impala is more helpful to test. >>>>>>> >>>>>>> https://github.com/rxin/TPC-H-Hive >>>>>>> https://github.com/cartershanklin/hive-testbench >>>>>>> https://github.com/cloudera/impala-tpcds-kit >>>>>>> >>>>>>> Thanks! >>>>>>> Hyoungjun >>>>>>> >>>>>>> >>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>> >>>>>>>> Hi Jihoon, >>>>>>>> >>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed >> it >>>> on >>>>> the >>>>>>>> wiki. >>>>>>>> >>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities >> to >>>>>>>> logical planning and distributed query planning. But, I'm not >> sure >>> it >>>>>>>> can be included in short-term roadmap. They are necessary, but >> they >>>>>>>> are not required right now. In my view, it would be reasonable to >>>>>>>> schedule them on long-term roadmap. >>>>>>>> >>>>>>>> Warm regards, >>>>>>>> Hyunsik >>>>>>>> >>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected] >>> >>>>> wrote: >>>>>>>>> Hi Hyunsik, >>>>>>>>> I'm very glad that we can release the next version, soon. >>>>>>>>> Also, appreciate for the guideline of the next roadmap. >>>>>>>>> >>>>>>>>> Addition to the aforementioned features, I have the two >>>> suggestions. >>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I >>>>> started it >>>>>>>>> quite a long time ago, but it is delayed due to the lower >>> priority >>>>> than >>>>>>>>> other stability issues. But, since this operator is widely used >>> in >>>>>>>> analytic >>>>>>>>> applications, we need to add this feature as soon as possible. >>> So, >>>>> in my >>>>>>>>> opinion, it would be good to add this feature to the next >>> roadmap. >>>>>>>>> >>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue >>> for >>>>>>>> making >>>>>>>>> the query plan more flexible. After that, we can employ the >>> plenty >>>>>>>>> optimization opportunities like described in TAJO-161. >>>>>>>>> >>>>>>>>> How do you guys think about these issues? >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jihoon >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: >>>>>>>>> >>>>>>>>>> Hi folks, >>>>>>>>>> >>>>>>>>>> I'm very happy to see that our community is growing! Also, >> It's >>> a >>>>>>>> pleasure >>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested >> various >>>>>>>> features >>>>>>>>>> in various contexts, and tried to figure out if there are any >>>>> critical >>>>>>>>>> problems. I think that there are only a few issues and we can >>>>> release >>>>>>>> 0.8.0 >>>>>>>>>> next week. If there are further issues to be solved before the >>>> 0.8.0 >>>>>>>>>> release, feel free to suggest ideas. >>>>>>>>>> >>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any >>>>>>>> suggestion >>>>>>>>>> from users, contributors, and committers. Please fire away! >>>>>>>>>> >>>>>>>>>> I'm thinking that our next stage should focus on improving the >>> way >>>>> Tajo >>>>>>>>>> runs in thousands of large cluster nodes and for a number of >>>>> concurrent >>>>>>>>>> users. The key issues associated with this include the >>> following: >>>>>>>>>> >>>>>>>>>> * High availability >>>>>>>>>> * Multi-tenancy scheduling >>>>>>>>>> * More stability >>>>>>>>>> * Improved shuffle >>>>>>>>>> >>>>>>>>>> The current work status is as follows. Min is working on >> Tajo's >>>> new >>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As >> far >>>> as I >>>>>>>> know, >>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys >>>>> including >>>>>>>>>> myself are investigating and solving the issues which occur in >>>> large >>>>>>>>>> clusters. These issues should be solved in order to make Tajo >> a >>>>> complete >>>>>>>>>> enterprise-ready production. >>>>>>>>>> >>>>>>>>>> In addition, there are some SQL feature support issues. Many >>>>> analytic >>>>>>>>>> problems require window functions. Also, in-subquery and >> scalar >>>>> subquery >>>>>>>>>> should be supported. So, I'd like to schedule them with high >>>>> priority. >>>>>>>> In >>>>>>>>>> my view, there will be very few SQL support issues if Tajo >>>> provides >>>>>>>> these >>>>>>>>>> features. >>>>>>>>>> >>>>>>>>>> Besides those areas, David is working on a nested schema and >> its >>>>> related >>>>>>>>>> work (TAJO-710). I guess this will take quite a while because >> it >>>>>>>> requires a >>>>>>>>>> lot of hard work. So, it would be great to schedule the nested >>>>> schema >>>>>>>>>> loosely. That's just my thoughts, anyhow. >>>>>>>>>> >>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest >>> that >>>>> we >>>>>>>> need >>>>>>>>>> to release more frequently after the 0.8.0 release. So far, >>> there >>>>> has >>>>>>>> been >>>>>>>>>> a long period between each release because Tajo is undergoing >>>> heavy >>>>>>>>>> development. By 'releasing early, releasing often', we will >> make >>>>> more >>>>>>>>>> tighter feedback loop between users and developers. >>>>>>>>>> >>>>>>>>>> I think that there are many additional many interesting issues >>> to >>>> be >>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We >> will >>>>> arrange >>>>>>>>>> our short-term roadmap and long-term roadmap based on your >>>>> suggestions. >>>>>>>>>> >>>>>>>>>> Thank you all so much for your contribution! >>>>>>>>>> >>>>>>>>>> Warm Regards, >>>>>>>>>> Hyunsik >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Tajo - Big Data Warehouse System on Hadoop >>>>>>> http://tajo.apache.org/ >>>>> >>>> >>>> >>>> >>>> -- >>>> My research interests are distributed systems, parallel computing and >>>> bytecode based virtual machine. >>>> >>>> My profile: >>>> http://www.linkedin.com/in/coderplay >>>> My blog: >>>> http://coderplay.javaeye.com >>>> >>> >>
