Re: [DISCUSS] 0.8.0 release and next roadmap

Hyunsik Choi Mon, 14 Apr 2014 22:56:29 -0700

Thank you for votes! Let's go ahead!

Cheers,
Hyunsik



On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote:

> +1
>
> I agree with Hyunsik.
> Sorry for late reply.
>
> 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성:
>
> > Until today realized that my reply haven't been sent.
> >
> > +1
> >
> > Totally agree with Hyunsik. 0.9 is more appropriate for the next release.
> >
> > Min
> >
> >
> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote:
> >
> >> +1
> >>
> >> I agree with Hyunsik as well. I think since 1.0 increments the major
> >> version number, it should be used for a particularly significant
> release. :)
> >>
> >> Thanks,
> >> David
> >>
> >>
> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote:
> >>
> >>> +1 Hyunsik.
> >>>
> >>> Thanks!
> >>> Warm Regards,
> >>> Alvin.
> >>>
> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
> >>>
> >>>> Hi folks,
> >>>>
> >>>> I'd like to discuss the next version number. In Jira, we have
> >> provisionally
> >>>> used 1.0, and we didn't decide the next major version. I propose 0.9
> as
> >> the
> >>>> next major version. What do you think about this?
> >>>>
> >>>> Regards,
> >>>> Hyunsik
> >>>>
> >>>>
> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]>
> >> wrote:
> >>>>
> >>>>> Min, thanks for reminding us!
> >>>>> It's a mandatory issue.
> >>>>> We need to implement that feature ASAP.
> >>>>>
> >>>>> Thanks,
> >>>>> Jihoon
> >>>>>
> >>>>>
> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>>>
> >>>>>> Min,
> >>>>>>
> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank
> >> you
> >>>>>> for reminding me. It would be achieved by modifying Query class to
> >>>>> execute
> >>>>>> independent execution blocks in parallel. I'll add it to the wiki.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Hyunsik
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will scan
> A
> >> at
> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in
> >>>>>>> parallel, right?
> >>>>>>>
> >>>>>>>
> >>>>>>> Min
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> I've just updated the roadmap page. Please take a look at the
> >> section
> >>>>>>>> 'After 0.8.0'
> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
> >>>>>>>>
> >>>>>>>> If there are missed or additional ideas, feel free to add them on
> >>>>> that
> >>>>>>>> page or suggest them here. After we discuss them more, we would
> >>>>> decide
> >>>>>>>> their priorities.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Hyunsik
> >>>>>>>>
> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]
> >
> >>>>>>> wrote:
> >>>>>>>>> Hi Hyoungjun,
> >>>>>>>>>
> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we
> provide
> >>>>>>>>> users with some prepared benchmark environment, users can test
> Tajo
> >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your
> >>>>>>>>> suggestion.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Hyunsik
> >>>>>>>>>
> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote:
> >>>>>>>>>> Hi Hyunsik ,
> >>>>>>>>>>
> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script
> >>>>> like
> >>>>>>> hive
> >>>>>>>>>> and impala is more helpful to test.
> >>>>>>>>>>
> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive
> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench
> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>> Hyoungjun
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Jihoon,
> >>>>>>>>>>>
> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I
> filed
> >>>>> it
> >>>>>>> on
> >>>>>>>> the
> >>>>>>>>>>> wiki.
> >>>>>>>>>>>
> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities
> >>>>> to
> >>>>>>>>>>> logical planning and distributed query planning. But, I'm not
> >>>>> sure
> >>>>>> it
> >>>>>>>>>>> can be included in short-term roadmap. They are necessary, but
> >>>>> they
> >>>>>>>>>>> are not required right now. In my view, it would be reasonable
> to
> >>>>>>>>>>> schedule them on long-term roadmap.
> >>>>>>>>>>>
> >>>>>>>>>>> Warm regards,
> >>>>>>>>>>> Hyunsik
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <
> [email protected]
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>> Hi Hyunsik,
> >>>>>>>>>>>> I'm very glad that we can release the next version, soon.
> >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Addition to the aforementioned features, I have the two
> >>>>>>> suggestions.
> >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I
> >>>>>>>> started it
> >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower
> >>>>>> priority
> >>>>>>>> than
> >>>>>>>>>>>> other stability issues. But, since this operator is widely
> used
> >>>>>> in
> >>>>>>>>>>> analytic
> >>>>>>>>>>>> applications, we need to add this feature as soon as possible.
> >>>>>> So,
> >>>>>>>> in my
> >>>>>>>>>>>> opinion, it would be good to add this feature to the next
> >>>>>> roadmap.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an
> issue
> >>>>>> for
> >>>>>>>>>>> making
> >>>>>>>>>>>> the query plan more flexible. After that, we can employ the
> >>>>>> plenty
> >>>>>>>>>>>> optimization opportunities like described in TAJO-161.
> >>>>>>>>>>>>
> >>>>>>>>>>>> How do you guys think about these issues?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>> Jihoon
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi folks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also,
> >>>>> It's
> >>>>>> a
> >>>>>>>>>>> pleasure
> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested
> >>>>> various
> >>>>>>>>>>> features
> >>>>>>>>>>>>> in various contexts, and tried to figure out if there are any
> >>>>>>>> critical
> >>>>>>>>>>>>> problems. I think that there are only a few issues and we can
> >>>>>>>> release
> >>>>>>>>>>> 0.8.0
> >>>>>>>>>>>>> next week. If there are further issues to be solved before
> the
> >>>>>>> 0.8.0
> >>>>>>>>>>>>> release, feel free to suggest ideas.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to
> any
> >>>>>>>>>>> suggestion
> >>>>>>>>>>>>> from users, contributors, and committers. Please fire away!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving
> the
> >>>>>> way
> >>>>>>>> Tajo
> >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number of
> >>>>>>>> concurrent
> >>>>>>>>>>>>> users. The key issues associated with this include the
> >>>>>> following:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * High availability
> >>>>>>>>>>>>> * Multi-tenancy scheduling
> >>>>>>>>>>>>> * More stability
> >>>>>>>>>>>>> * Improved shuffle
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The current work status is as follows. Min is working on
> >>>>> Tajo's
> >>>>>>> new
> >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As
> >>>>> far
> >>>>>>> as I
> >>>>>>>>>>> know,
> >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
> >>>>>>>> including
> >>>>>>>>>>>>> myself are investigating and solving the issues which occur
> in
> >>>>>>> large
> >>>>>>>>>>>>> clusters. These issues should be solved in order to make Tajo
> >>>>> a
> >>>>>>>> complete
> >>>>>>>>>>>>> enterprise-ready production.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In addition, there are some SQL feature support issues. Many
> >>>>>>>> analytic
> >>>>>>>>>>>>> problems require window functions. Also, in-subquery and
> >>>>> scalar
> >>>>>>>> subquery
> >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with high
> >>>>>>>> priority.
> >>>>>>>>>>> In
> >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo
> >>>>>>> provides
> >>>>>>>>>>> these
> >>>>>>>>>>>>> features.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema and
> >>>>> its
> >>>>>>>> related
> >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because
> >>>>> it
> >>>>>>>>>>> requires a
> >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the
> nested
> >>>>>>>> schema
> >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest
> >>>>>> that
> >>>>>>>> we
> >>>>>>>>>>> need
> >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So far,
> >>>>>> there
> >>>>>>>> has
> >>>>>>>>>>> been
> >>>>>>>>>>>>> a long period between each release because Tajo is undergoing
> >>>>>>> heavy
> >>>>>>>>>>>>> development. By 'releasing early, releasing often', we will
> >>>>> make
> >>>>>>>> more
> >>>>>>>>>>>>> tighter feedback loop between users and developers.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think that there are many additional many interesting
> issues
> >>>>>> to
> >>>>>>> be
> >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We
> >>>>> will
> >>>>>>>> arrange
> >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your
> >>>>>>>> suggestions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you all so much for your contribution!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Warm Regards,
> >>>>>>>>>>>>> Hyunsik
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
> >>>>>>>>>> http://tajo.apache.org/
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> My research interests are distributed systems, parallel computing
> and
> >>>>>>> bytecode based virtual machine.
> >>>>>>>
> >>>>>>> My profile:
> >>>>>>> http://www.linkedin.com/in/coderplay
> >>>>>>> My blog:
> >>>>>>> http://coderplay.javaeye.com
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
> >
> >
> > --
> > My research interests are distributed systems, parallel computing and
> > bytecode based virtual machine.
> >
> > My profile:
> > http://www.linkedin.com/in/coderplay
> > My blog:
> > http://coderplay.javaeye.com
>
>

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to