Re: [DISCUSS] 0.8.0 release and next roadmap

Min Zhou Wed, 09 Apr 2014 10:44:41 -0700

Yeah.. Another issue,  seems a query like A join B. Tajo will scan A at
first stage, after that in the 2nd stage scan B. Doesn't run it in
parallel, right?



Min


On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]> wrote:

> I've just updated the roadmap page. Please take a look at the section
> 'After 0.8.0'
> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>
> If there are missed or additional ideas, feel free to add them on that
> page or suggest them here. After we discuss them more, we would decide
> their priorities.
>
> Best regards,
> Hyunsik
>
> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]> wrote:
> > Hi Hyoungjun,
> >
> > Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
> > users with some prepared benchmark environment, users can test Tajo
> > easily. I'll file your idea on the wiki. Thank you for your
> > suggestion.
> >
> > Regards,
> > Hyunsik
> >
> > On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote:
> >> Hi Hyunsik ,
> >>
> >> I did benchmark test with TPC-H, TPC-DS data. Benchmark script like hive
> >> and impala is more helpful to test.
> >>
> >> https://github.com/rxin/TPC-H-Hive
> >> https://github.com/cartershanklin/hive-testbench
> >> https://github.com/cloudera/impala-tpcds-kit
> >>
> >> Thanks!
> >> Hyoungjun
> >>
> >>
> >> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>
> >>> Hi Jihoon,
> >>>
> >>> CUBE and ROLL-UP are key features for analytic problems. I filed it on
> the
> >>> wiki.
> >>>
> >>> TAJO-266 and TAJO-161 will give more optimization opportunities to
> >>> logical planning and distributed query planning. But, I'm not sure it
> >>> can be included in short-term roadmap. They are necessary, but they
> >>> are not required right now. In my view, it would be reasonable to
> >>> schedule them on long-term roadmap.
> >>>
> >>> Warm regards,
> >>> Hyunsik
> >>>
> >>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]>
> wrote:
> >>> > Hi Hyunsik,
> >>> > I'm very glad that we can release the next version, soon.
> >>> > Also, appreciate for the guideline of the next roadmap.
> >>> >
> >>> > Addition to the aforementioned features, I have the two suggestions.
> >>> > First is the support of CUBE operator (TAJO-259). Acutally, I
> started it
> >>> > quite a long time ago, but it is delayed due to the lower priority
> than
> >>> > other stability issues. But, since this operator is widely used in
> >>> analytic
> >>> > applications, we need to add this feature as soon as possible. So,
> in my
> >>> > opinion, it would be good to add this feature to the next roadmap.
> >>> >
> >>> > Second is the advanced query optimization. TAJO-266 is an issue for
> >>> making
> >>> > the query plan more flexible. After that, we can employ the plenty
> >>> > optimization opportunities like described in TAJO-161.
> >>> >
> >>> > How do you guys think about these issues?
> >>> >
> >>> > Best Regards,
> >>> > Jihoon
> >>> >
> >>> >
> >>> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>> >
> >>> >> Hi folks,
> >>> >>
> >>> >> I'm very happy to see that our community is growing! Also, It's a
> >>> pleasure
> >>> >> to discuss the Tajo 0.8.0 release. Recently, I've tested various
> >>> features
> >>> >> in various contexts, and tried to figure out if there are any
> critical
> >>> >> problems. I think that there are only a few issues and we can
> release
> >>> 0.8.0
> >>> >> next week. If there are further issues to be solved before the 0.8.0
> >>> >> release, feel free to suggest ideas.
> >>> >>
> >>> >> Also, I'd like to discuss our next roadmap. We are open to any
> >>> suggestion
> >>> >> from users, contributors, and committers. Please fire away!
> >>> >>
> >>> >> I'm thinking that our next stage should focus on improving the way
> Tajo
> >>> >> runs in thousands of large cluster nodes and for a number of
> concurrent
> >>> >> users. The key issues associated with this include the following:
> >>> >>
> >>> >> * High availability
> >>> >> * Multi-tenancy scheduling
> >>> >> * More stability
> >>> >> * Improved shuffle
> >>> >>
> >>> >> The current work status is as follows. Min is working on Tajo's new
> >>> >> scheduler (TAJO-540) based on sparrow. I'll support him. As far as I
> >>> know,
> >>> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
> including
> >>> >> myself are investigating and solving the issues which occur in large
> >>> >> clusters. These issues should be solved in order to make Tajo a
> complete
> >>> >> enterprise-ready production.
> >>> >>
> >>> >> In addition, there are some SQL feature support issues. Many
> analytic
> >>> >> problems require window functions. Also, in-subquery and scalar
> subquery
> >>> >> should be supported. So, I'd like to schedule them with high
> priority.
> >>> In
> >>> >> my view, there will be very few SQL support issues if Tajo provides
> >>> these
> >>> >> features.
> >>> >>
> >>> >> Besides those areas, David is working on a nested schema and its
> related
> >>> >> work (TAJO-710). I guess this will take quite a while because it
> >>> requires a
> >>> >> lot of hard work. So, it would be great to schedule the nested
> schema
> >>> >> loosely. That's just my thoughts, anyhow.
> >>> >>
> >>> >> Aside from the discussion of our roadmap, I'd like to suggest that
> we
> >>> need
> >>> >> to release more frequently after the 0.8.0 release. So far, there
> has
> >>> been
> >>> >> a long period between each release because Tajo is undergoing heavy
> >>> >> development. By 'releasing early, releasing often', we will make
> more
> >>> >> tighter feedback loop between users and developers.
> >>> >>
> >>> >> I think that there are many additional many interesting issues to be
> >>> >> included in our roadmap. Feel free to suggest your idea. We will
> arrange
> >>> >> our short-term roadmap and long-term roadmap based on your
> suggestions.
> >>> >>
> >>> >> Thank you all so much for your contribution!
> >>> >>
> >>> >> Warm Regards,
> >>> >> Hyunsik
> >>> >>
> >>>
> >>
> >>
> >>
> >> --
> >> Tajo - Big Data Warehouse System on Hadoop
> >> http://tajo.apache.org/
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to