Re: [DISCUSS] 0.8.0 release and next roadmap

Jihoon Son Wed, 09 Apr 2014 19:06:22 -0700

Min, thanks for reminding us!
It's a mandatory issue.
We need to implement that feature ASAP.


Thanks,
Jihoon


2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>:

> Min,
>
> Yes, you are right. I'm thinking it everyday, but I missed it. Thank you
> for reminding me. It would be achieved by modifying Query class to execute
> independent execution blocks in parallel. I'll add it to the wiki.
>
> Thanks,
> Hyunsik
>
>
> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]> wrote:
>
> > Yeah.. Another issue,  seems a query like A join B. Tajo will scan A at
> > first stage, after that in the 2nd stage scan B. Doesn't run it in
> > parallel, right?
> >
> >
> > Min
> >
> >
> > On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]>
> wrote:
> >
> > > I've just updated the roadmap page. Please take a look at the section
> > > 'After 0.8.0'
> > > https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
> > >
> > > If there are missed or additional ideas, feel free to add them on that
> > > page or suggest them here. After we discuss them more, we would decide
> > > their priorities.
> > >
> > > Best regards,
> > > Hyunsik
> > >
> > > On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]>
> > wrote:
> > > > Hi Hyoungjun,
> > > >
> > > > Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
> > > > users with some prepared benchmark environment, users can test Tajo
> > > > easily. I'll file your idea on the wiki. Thank you for your
> > > > suggestion.
> > > >
> > > > Regards,
> > > > Hyunsik
> > > >
> > > > On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote:
> > > >> Hi Hyunsik ,
> > > >>
> > > >> I did benchmark test with TPC-H, TPC-DS data. Benchmark script like
> > hive
> > > >> and impala is more helpful to test.
> > > >>
> > > >> https://github.com/rxin/TPC-H-Hive
> > > >> https://github.com/cartershanklin/hive-testbench
> > > >> https://github.com/cloudera/impala-tpcds-kit
> > > >>
> > > >> Thanks!
> > > >> Hyoungjun
> > > >>
> > > >>
> > > >> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>:
> > > >>
> > > >>> Hi Jihoon,
> > > >>>
> > > >>> CUBE and ROLL-UP are key features for analytic problems. I filed it
> > on
> > > the
> > > >>> wiki.
> > > >>>
> > > >>> TAJO-266 and TAJO-161 will give more optimization opportunities to
> > > >>> logical planning and distributed query planning. But, I'm not sure
> it
> > > >>> can be included in short-term roadmap. They are necessary, but they
> > > >>> are not required right now. In my view, it would be reasonable to
> > > >>> schedule them on long-term roadmap.
> > > >>>
> > > >>> Warm regards,
> > > >>> Hyunsik
> > > >>>
> > > >>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]>
> > > wrote:
> > > >>> > Hi Hyunsik,
> > > >>> > I'm very glad that we can release the next version, soon.
> > > >>> > Also, appreciate for the guideline of the next roadmap.
> > > >>> >
> > > >>> > Addition to the aforementioned features, I have the two
> > suggestions.
> > > >>> > First is the support of CUBE operator (TAJO-259). Acutally, I
> > > started it
> > > >>> > quite a long time ago, but it is delayed due to the lower
> priority
> > > than
> > > >>> > other stability issues. But, since this operator is widely used
> in
> > > >>> analytic
> > > >>> > applications, we need to add this feature as soon as possible.
> So,
> > > in my
> > > >>> > opinion, it would be good to add this feature to the next
> roadmap.
> > > >>> >
> > > >>> > Second is the advanced query optimization. TAJO-266 is an issue
> for
> > > >>> making
> > > >>> > the query plan more flexible. After that, we can employ the
> plenty
> > > >>> > optimization opportunities like described in TAJO-161.
> > > >>> >
> > > >>> > How do you guys think about these issues?
> > > >>> >
> > > >>> > Best Regards,
> > > >>> > Jihoon
> > > >>> >
> > > >>> >
> > > >>> > 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>:
> > > >>> >
> > > >>> >> Hi folks,
> > > >>> >>
> > > >>> >> I'm very happy to see that our community is growing! Also, It's
> a
> > > >>> pleasure
> > > >>> >> to discuss the Tajo 0.8.0 release. Recently, I've tested various
> > > >>> features
> > > >>> >> in various contexts, and tried to figure out if there are any
> > > critical
> > > >>> >> problems. I think that there are only a few issues and we can
> > > release
> > > >>> 0.8.0
> > > >>> >> next week. If there are further issues to be solved before the
> > 0.8.0
> > > >>> >> release, feel free to suggest ideas.
> > > >>> >>
> > > >>> >> Also, I'd like to discuss our next roadmap. We are open to any
> > > >>> suggestion
> > > >>> >> from users, contributors, and committers. Please fire away!
> > > >>> >>
> > > >>> >> I'm thinking that our next stage should focus on improving the
> way
> > > Tajo
> > > >>> >> runs in thousands of large cluster nodes and for a number of
> > > concurrent
> > > >>> >> users. The key issues associated with this include the
> following:
> > > >>> >>
> > > >>> >> * High availability
> > > >>> >> * Multi-tenancy scheduling
> > > >>> >> * More stability
> > > >>> >> * Improved shuffle
> > > >>> >>
> > > >>> >> The current work status is as follows. Min is working on Tajo's
> > new
> > > >>> >> scheduler (TAJO-540) based on sparrow. I'll support him. As far
> > as I
> > > >>> know,
> > > >>> >> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
> > > including
> > > >>> >> myself are investigating and solving the issues which occur in
> > large
> > > >>> >> clusters. These issues should be solved in order to make Tajo a
> > > complete
> > > >>> >> enterprise-ready production.
> > > >>> >>
> > > >>> >> In addition, there are some SQL feature support issues. Many
> > > analytic
> > > >>> >> problems require window functions. Also, in-subquery and scalar
> > > subquery
> > > >>> >> should be supported. So, I'd like to schedule them with high
> > > priority.
> > > >>> In
> > > >>> >> my view, there will be very few SQL support issues if Tajo
> > provides
> > > >>> these
> > > >>> >> features.
> > > >>> >>
> > > >>> >> Besides those areas, David is working on a nested schema and its
> > > related
> > > >>> >> work (TAJO-710). I guess this will take quite a while because it
> > > >>> requires a
> > > >>> >> lot of hard work. So, it would be great to schedule the nested
> > > schema
> > > >>> >> loosely. That's just my thoughts, anyhow.
> > > >>> >>
> > > >>> >> Aside from the discussion of our roadmap, I'd like to suggest
> that
> > > we
> > > >>> need
> > > >>> >> to release more frequently after the 0.8.0 release. So far,
> there
> > > has
> > > >>> been
> > > >>> >> a long period between each release because Tajo is undergoing
> > heavy
> > > >>> >> development. By 'releasing early, releasing often', we will make
> > > more
> > > >>> >> tighter feedback loop between users and developers.
> > > >>> >>
> > > >>> >> I think that there are many additional many interesting issues
> to
> > be
> > > >>> >> included in our roadmap. Feel free to suggest your idea. We will
> > > arrange
> > > >>> >> our short-term roadmap and long-term roadmap based on your
> > > suggestions.
> > > >>> >>
> > > >>> >> Thank you all so much for your contribution!
> > > >>> >>
> > > >>> >> Warm Regards,
> > > >>> >> Hyunsik
> > > >>> >>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Tajo - Big Data Warehouse System on Hadoop
> > > >> http://tajo.apache.org/
> > >
> >
> >
> >
> > --
> > My research interests are distributed systems, parallel computing and
> > bytecode based virtual machine.
> >
> > My profile:
> > http://www.linkedin.com/in/coderplay
> > My blog:
> > http://coderplay.javaeye.com
> >
>

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to