Min, Yes, you are right. I'm thinking it everyday, but I missed it. Thank you for reminding me. It would be achieved by modifying Query class to execute independent execution blocks in parallel. I'll add it to the wiki.
Thanks, Hyunsik On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> wrote: > Yeah.. Another issue, seems a query like A join B. Tajo will scan A at > first stage, after that in the 2nd stage scan B. Doesn't run it in > parallel, right? > > > Min > > > On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> wrote: > > > I've just updated the roadmap page. Please take a look at the section > > 'After 0.8.0' > > https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap > > > > If there are missed or additional ideas, feel free to add them on that > > page or suggest them here. After we discuss them more, we would decide > > their priorities. > > > > Best regards, > > Hyunsik > > > > On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> > wrote: > > > Hi Hyoungjun, > > > > > > Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide > > > users with some prepared benchmark environment, users can test Tajo > > > easily. I'll file your idea on the wiki. Thank you for your > > > suggestion. > > > > > > Regards, > > > Hyunsik > > > > > > On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote: > > >> Hi Hyunsik , > > >> > > >> I did benchmark test with TPC-H, TPC-DS data. Benchmark script like > hive > > >> and impala is more helpful to test. > > >> > > >> https://github.com/rxin/TPC-H-Hive > > >> https://github.com/cartershanklin/hive-testbench > > >> https://github.com/cloudera/impala-tpcds-kit > > >> > > >> Thanks! > > >> Hyoungjun > > >> > > >> > > >> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>: > > >> > > >>> Hi Jihoon, > > >>> > > >>> CUBE and ROLL-UP are key features for analytic problems. I filed it > on > > the > > >>> wiki. > > >>> > > >>> TAJO-266 and TAJO-161 will give more optimization opportunities to > > >>> logical planning and distributed query planning. But, I'm not sure it > > >>> can be included in short-term roadmap. They are necessary, but they > > >>> are not required right now. In my view, it would be reasonable to > > >>> schedule them on long-term roadmap. > > >>> > > >>> Warm regards, > > >>> Hyunsik > > >>> > > >>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]> > > wrote: > > >>> > Hi Hyunsik, > > >>> > I'm very glad that we can release the next version, soon. > > >>> > Also, appreciate for the guideline of the next roadmap. > > >>> > > > >>> > Addition to the aforementioned features, I have the two > suggestions. > > >>> > First is the support of CUBE operator (TAJO-259). Acutally, I > > started it > > >>> > quite a long time ago, but it is delayed due to the lower priority > > than > > >>> > other stability issues. But, since this operator is widely used in > > >>> analytic > > >>> > applications, we need to add this feature as soon as possible. So, > > in my > > >>> > opinion, it would be good to add this feature to the next roadmap. > > >>> > > > >>> > Second is the advanced query optimization. TAJO-266 is an issue for > > >>> making > > >>> > the query plan more flexible. After that, we can employ the plenty > > >>> > optimization opportunities like described in TAJO-161. > > >>> > > > >>> > How do you guys think about these issues? > > >>> > > > >>> > Best Regards, > > >>> > Jihoon > > >>> > > > >>> > > > >>> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>: > > >>> > > > >>> >> Hi folks, > > >>> >> > > >>> >> I'm very happy to see that our community is growing! Also, It's a > > >>> pleasure > > >>> >> to discuss the Tajo 0.8.0 release. Recently, I've tested various > > >>> features > > >>> >> in various contexts, and tried to figure out if there are any > > critical > > >>> >> problems. I think that there are only a few issues and we can > > release > > >>> 0.8.0 > > >>> >> next week. If there are further issues to be solved before the > 0.8.0 > > >>> >> release, feel free to suggest ideas. > > >>> >> > > >>> >> Also, I'd like to discuss our next roadmap. We are open to any > > >>> suggestion > > >>> >> from users, contributors, and committers. Please fire away! > > >>> >> > > >>> >> I'm thinking that our next stage should focus on improving the way > > Tajo > > >>> >> runs in thousands of large cluster nodes and for a number of > > concurrent > > >>> >> users. The key issues associated with this include the following: > > >>> >> > > >>> >> * High availability > > >>> >> * Multi-tenancy scheduling > > >>> >> * More stability > > >>> >> * Improved shuffle > > >>> >> > > >>> >> The current work status is as follows. Min is working on Tajo's > new > > >>> >> scheduler (TAJO-540) based on sparrow. I'll support him. As far > as I > > >>> know, > > >>> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys > > including > > >>> >> myself are investigating and solving the issues which occur in > large > > >>> >> clusters. These issues should be solved in order to make Tajo a > > complete > > >>> >> enterprise-ready production. > > >>> >> > > >>> >> In addition, there are some SQL feature support issues. Many > > analytic > > >>> >> problems require window functions. Also, in-subquery and scalar > > subquery > > >>> >> should be supported. So, I'd like to schedule them with high > > priority. > > >>> In > > >>> >> my view, there will be very few SQL support issues if Tajo > provides > > >>> these > > >>> >> features. > > >>> >> > > >>> >> Besides those areas, David is working on a nested schema and its > > related > > >>> >> work (TAJO-710). I guess this will take quite a while because it > > >>> requires a > > >>> >> lot of hard work. So, it would be great to schedule the nested > > schema > > >>> >> loosely. That's just my thoughts, anyhow. > > >>> >> > > >>> >> Aside from the discussion of our roadmap, I'd like to suggest that > > we > > >>> need > > >>> >> to release more frequently after the 0.8.0 release. So far, there > > has > > >>> been > > >>> >> a long period between each release because Tajo is undergoing > heavy > > >>> >> development. By 'releasing early, releasing often', we will make > > more > > >>> >> tighter feedback loop between users and developers. > > >>> >> > > >>> >> I think that there are many additional many interesting issues to > be > > >>> >> included in our roadmap. Feel free to suggest your idea. We will > > arrange > > >>> >> our short-term roadmap and long-term roadmap based on your > > suggestions. > > >>> >> > > >>> >> Thank you all so much for your contribution! > > >>> >> > > >>> >> Warm Regards, > > >>> >> Hyunsik > > >>> >> > > >>> > > >> > > >> > > >> > > >> -- > > >> Tajo - Big Data Warehouse System on Hadoop > > >> http://tajo.apache.org/ > > > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
