Hi folks, I'd like to discuss the next version number. In Jira, we have provisionally used 1.0, and we didn't decide the next major version. I propose 0.9 as the next major version. What do you think about this?
Regards, Hyunsik On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]> wrote: > Min, thanks for reminding us! > It's a mandatory issue. > We need to implement that feature ASAP. > > Thanks, > Jihoon > > > 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>: > > > Min, > > > > Yes, you are right. I'm thinking it everyday, but I missed it. Thank you > > for reminding me. It would be achieved by modifying Query class to > execute > > independent execution blocks in parallel. I'll add it to the wiki. > > > > Thanks, > > Hyunsik > > > > > > On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> wrote: > > > > > Yeah.. Another issue, seems a query like A join B. Tajo will scan A at > > > first stage, after that in the 2nd stage scan B. Doesn't run it in > > > parallel, right? > > > > > > > > > Min > > > > > > > > > On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> > > wrote: > > > > > > > I've just updated the roadmap page. Please take a look at the section > > > > 'After 0.8.0' > > > > https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > > > > > > > > If there are missed or additional ideas, feel free to add them on > that > > > > page or suggest them here. After we discuss them more, we would > decide > > > > their priorities. > > > > > > > > Best regards, > > > > Hyunsik > > > > > > > > On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> > > > wrote: > > > > > Hi Hyoungjun, > > > > > > > > > > Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide > > > > > users with some prepared benchmark environment, users can test Tajo > > > > > easily. I'll file your idea on the wiki. Thank you for your > > > > > suggestion. > > > > > > > > > > Regards, > > > > > Hyunsik > > > > > > > > > > On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: > > > > >> Hi Hyunsik , > > > > >> > > > > >> I did benchmark test with TPC-H, TPC-DS data. Benchmark script > like > > > hive > > > > >> and impala is more helpful to test. > > > > >> > > > > >> https://github.com/rxin/TPC-H-Hive > > > > >> https://github.com/cartershanklin/hive-testbench > > > > >> https://github.com/cloudera/impala-tpcds-kit > > > > >> > > > > >> Thanks! > > > > >> Hyoungjun > > > > >> > > > > >> > > > > >> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: > > > > >> > > > > >>> Hi Jihoon, > > > > >>> > > > > >>> CUBE and ROLL-UP are key features for analytic problems. I filed > it > > > on > > > > the > > > > >>> wiki. > > > > >>> > > > > >>> TAJO-266 and TAJO-161 will give more optimization opportunities > to > > > > >>> logical planning and distributed query planning. But, I'm not > sure > > it > > > > >>> can be included in short-term roadmap. They are necessary, but > they > > > > >>> are not required right now. In my view, it would be reasonable to > > > > >>> schedule them on long-term roadmap. > > > > >>> > > > > >>> Warm regards, > > > > >>> Hyunsik > > > > >>> > > > > >>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected] > > > > > > wrote: > > > > >>> > Hi Hyunsik, > > > > >>> > I'm very glad that we can release the next version, soon. > > > > >>> > Also, appreciate for the guideline of the next roadmap. > > > > >>> > > > > > >>> > Addition to the aforementioned features, I have the two > > > suggestions. > > > > >>> > First is the support of CUBE operator (TAJO-259). Acutally, I > > > > started it > > > > >>> > quite a long time ago, but it is delayed due to the lower > > priority > > > > than > > > > >>> > other stability issues. But, since this operator is widely used > > in > > > > >>> analytic > > > > >>> > applications, we need to add this feature as soon as possible. > > So, > > > > in my > > > > >>> > opinion, it would be good to add this feature to the next > > roadmap. > > > > >>> > > > > > >>> > Second is the advanced query optimization. TAJO-266 is an issue > > for > > > > >>> making > > > > >>> > the query plan more flexible. After that, we can employ the > > plenty > > > > >>> > optimization opportunities like described in TAJO-161. > > > > >>> > > > > > >>> > How do you guys think about these issues? > > > > >>> > > > > > >>> > Best Regards, > > > > >>> > Jihoon > > > > >>> > > > > > >>> > > > > > >>> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: > > > > >>> > > > > > >>> >> Hi folks, > > > > >>> >> > > > > >>> >> I'm very happy to see that our community is growing! Also, > It's > > a > > > > >>> pleasure > > > > >>> >> to discuss the Tajo 0.8.0 release. Recently, I've tested > various > > > > >>> features > > > > >>> >> in various contexts, and tried to figure out if there are any > > > > critical > > > > >>> >> problems. I think that there are only a few issues and we can > > > > release > > > > >>> 0.8.0 > > > > >>> >> next week. If there are further issues to be solved before the > > > 0.8.0 > > > > >>> >> release, feel free to suggest ideas. > > > > >>> >> > > > > >>> >> Also, I'd like to discuss our next roadmap. We are open to any > > > > >>> suggestion > > > > >>> >> from users, contributors, and committers. Please fire away! > > > > >>> >> > > > > >>> >> I'm thinking that our next stage should focus on improving the > > way > > > > Tajo > > > > >>> >> runs in thousands of large cluster nodes and for a number of > > > > concurrent > > > > >>> >> users. The key issues associated with this include the > > following: > > > > >>> >> > > > > >>> >> * High availability > > > > >>> >> * Multi-tenancy scheduling > > > > >>> >> * More stability > > > > >>> >> * Improved shuffle > > > > >>> >> > > > > >>> >> The current work status is as follows. Min is working on > Tajo's > > > new > > > > >>> >> scheduler (TAJO-540) based on sparrow. I'll support him. As > far > > > as I > > > > >>> know, > > > > >>> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys > > > > including > > > > >>> >> myself are investigating and solving the issues which occur in > > > large > > > > >>> >> clusters. These issues should be solved in order to make Tajo > a > > > > complete > > > > >>> >> enterprise-ready production. > > > > >>> >> > > > > >>> >> In addition, there are some SQL feature support issues. Many > > > > analytic > > > > >>> >> problems require window functions. Also, in-subquery and > scalar > > > > subquery > > > > >>> >> should be supported. So, I'd like to schedule them with high > > > > priority. > > > > >>> In > > > > >>> >> my view, there will be very few SQL support issues if Tajo > > > provides > > > > >>> these > > > > >>> >> features. > > > > >>> >> > > > > >>> >> Besides those areas, David is working on a nested schema and > its > > > > related > > > > >>> >> work (TAJO-710). I guess this will take quite a while because > it > > > > >>> requires a > > > > >>> >> lot of hard work. So, it would be great to schedule the nested > > > > schema > > > > >>> >> loosely. That's just my thoughts, anyhow. > > > > >>> >> > > > > >>> >> Aside from the discussion of our roadmap, I'd like to suggest > > that > > > > we > > > > >>> need > > > > >>> >> to release more frequently after the 0.8.0 release. So far, > > there > > > > has > > > > >>> been > > > > >>> >> a long period between each release because Tajo is undergoing > > > heavy > > > > >>> >> development. By 'releasing early, releasing often', we will > make > > > > more > > > > >>> >> tighter feedback loop between users and developers. > > > > >>> >> > > > > >>> >> I think that there are many additional many interesting issues > > to > > > be > > > > >>> >> included in our roadmap. Feel free to suggest your idea. We > will > > > > arrange > > > > >>> >> our short-term roadmap and long-term roadmap based on your > > > > suggestions. > > > > >>> >> > > > > >>> >> Thank you all so much for your contribution! > > > > >>> >> > > > > >>> >> Warm Regards, > > > > >>> >> Hyunsik > > > > >>> >> > > > > >>> > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Tajo - Big Data Warehouse System on Hadoop > > > > >> http://tajo.apache.org/ > > > > > > > > > > > > > > > > -- > > > My research interests are distributed systems, parallel computing and > > > bytecode based virtual machine. > > > > > > My profile: > > > http://www.linkedin.com/in/coderplay > > > My blog: > > > http://coderplay.javaeye.com > > > > > >
