Hi Hyunsik , I did benchmark test with TPC-H, TPC-DS data. Benchmark script like hive and impala is more helpful to test.
https://github.com/rxin/TPC-H-Hive https://github.com/cartershanklin/hive-testbench https://github.com/cloudera/impala-tpcds-kit Thanks! Hyoungjun 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: > Hi Jihoon, > > CUBE and ROLL-UP are key features for analytic problems. I filed it on the > wiki. > > TAJO-266 and TAJO-161 will give more optimization opportunities to > logical planning and distributed query planning. But, I'm not sure it > can be included in short-term roadmap. They are necessary, but they > are not required right now. In my view, it would be reasonable to > schedule them on long-term roadmap. > > Warm regards, > Hyunsik > > On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]> wrote: > > Hi Hyunsik, > > I'm very glad that we can release the next version, soon. > > Also, appreciate for the guideline of the next roadmap. > > > > Addition to the aforementioned features, I have the two suggestions. > > First is the support of CUBE operator (TAJO-259). Acutally, I started it > > quite a long time ago, but it is delayed due to the lower priority than > > other stability issues. But, since this operator is widely used in > analytic > > applications, we need to add this feature as soon as possible. So, in my > > opinion, it would be good to add this feature to the next roadmap. > > > > Second is the advanced query optimization. TAJO-266 is an issue for > making > > the query plan more flexible. After that, we can employ the plenty > > optimization opportunities like described in TAJO-161. > > > > How do you guys think about these issues? > > > > Best Regards, > > Jihoon > > > > > > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: > > > >> Hi folks, > >> > >> I'm very happy to see that our community is growing! Also, It's a > pleasure > >> to discuss the Tajo 0.8.0 release. Recently, I've tested various > features > >> in various contexts, and tried to figure out if there are any critical > >> problems. I think that there are only a few issues and we can release > 0.8.0 > >> next week. If there are further issues to be solved before the 0.8.0 > >> release, feel free to suggest ideas. > >> > >> Also, I'd like to discuss our next roadmap. We are open to any > suggestion > >> from users, contributors, and committers. Please fire away! > >> > >> I'm thinking that our next stage should focus on improving the way Tajo > >> runs in thousands of large cluster nodes and for a number of concurrent > >> users. The key issues associated with this include the following: > >> > >> * High availability > >> * Multi-tenancy scheduling > >> * More stability > >> * Improved shuffle > >> > >> The current work status is as follows. Min is working on Tajo's new > >> scheduler (TAJO-540) based on sparrow. I'll support him. As far as I > know, > >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys including > >> myself are investigating and solving the issues which occur in large > >> clusters. These issues should be solved in order to make Tajo a complete > >> enterprise-ready production. > >> > >> In addition, there are some SQL feature support issues. Many analytic > >> problems require window functions. Also, in-subquery and scalar subquery > >> should be supported. So, I'd like to schedule them with high priority. > In > >> my view, there will be very few SQL support issues if Tajo provides > these > >> features. > >> > >> Besides those areas, David is working on a nested schema and its related > >> work (TAJO-710). I guess this will take quite a while because it > requires a > >> lot of hard work. So, it would be great to schedule the nested schema > >> loosely. That's just my thoughts, anyhow. > >> > >> Aside from the discussion of our roadmap, I'd like to suggest that we > need > >> to release more frequently after the 0.8.0 release. So far, there has > been > >> a long period between each release because Tajo is undergoing heavy > >> development. By 'releasing early, releasing often', we will make more > >> tighter feedback loop between users and developers. > >> > >> I think that there are many additional many interesting issues to be > >> included in our roadmap. Feel free to suggest your idea. We will arrange > >> our short-term roadmap and long-term roadmap based on your suggestions. > >> > >> Thank you all so much for your contribution! > >> > >> Warm Regards, > >> Hyunsik > >> > -- Tajo - Big Data Warehouse System on Hadoop http://tajo.apache.org/
