Re: [DISCUSS] 0.8.0 release and next roadmap

Min Zhou Mon, 14 Apr 2014 13:07:36 -0700

Until today realized that my reply haven't been sent.

+1


Totally agree with Hyunsik. 0.9 is more appropriate for the next release.

Min


On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote:

> +1
>
> I agree with Hyunsik as well. I think since 1.0 increments the major
> version number, it should be used for a particularly significant release. :)
>
> Thanks,
> David
>
>
> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote:
>
> > +1 Hyunsik.
> >
> > Thanks!
> > Warm Regards,
> > Alvin.
> >
> > On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
> >
> >> Hi folks,
> >>
> >> I'd like to discuss the next version number. In Jira, we have
> provisionally
> >> used 1.0, and we didn't decide the next major version. I propose 0.9 as
> the
> >> next major version. What do you think about this?
> >>
> >> Regards,
> >> Hyunsik
> >>
> >>
> >> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]>
> wrote:
> >>
> >>> Min, thanks for reminding us!
> >>> It's a mandatory issue.
> >>> We need to implement that feature ASAP.
> >>>
> >>> Thanks,
> >>> Jihoon
> >>>
> >>>
> >>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>
> >>>> Min,
> >>>>
> >>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank
> you
> >>>> for reminding me. It would be achieved by modifying Query class to
> >>> execute
> >>>> independent execution blocks in parallel. I'll add it to the wiki.
> >>>>
> >>>> Thanks,
> >>>> Hyunsik
> >>>>
> >>>>
> >>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]>
> wrote:
> >>>>
> >>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will scan A
> at
> >>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in
> >>>>> parallel, right?
> >>>>>
> >>>>>
> >>>>> Min
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> I've just updated the roadmap page. Please take a look at the
> section
> >>>>>> 'After 0.8.0'
> >>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
> >>>>>>
> >>>>>> If there are missed or additional ideas, feel free to add them on
> >>> that
> >>>>>> page or suggest them here. After we discuss them more, we would
> >>> decide
> >>>>>> their priorities.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Hyunsik
> >>>>>>
> >>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]>
> >>>>> wrote:
> >>>>>>> Hi Hyoungjun,
> >>>>>>>
> >>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
> >>>>>>> users with some prepared benchmark environment, users can test Tajo
> >>>>>>> easily. I'll file your idea on the wiki. Thank you for your
> >>>>>>> suggestion.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Hyunsik
> >>>>>>>
> >>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote:
> >>>>>>>> Hi Hyunsik ,
> >>>>>>>>
> >>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script
> >>> like
> >>>>> hive
> >>>>>>>> and impala is more helpful to test.
> >>>>>>>>
> >>>>>>>> https://github.com/rxin/TPC-H-Hive
> >>>>>>>> https://github.com/cartershanklin/hive-testbench
> >>>>>>>> https://github.com/cloudera/impala-tpcds-kit
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> Hyoungjun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>>>>>>
> >>>>>>>>> Hi Jihoon,
> >>>>>>>>>
> >>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed
> >>> it
> >>>>> on
> >>>>>> the
> >>>>>>>>> wiki.
> >>>>>>>>>
> >>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities
> >>> to
> >>>>>>>>> logical planning and distributed query planning. But, I'm not
> >>> sure
> >>>> it
> >>>>>>>>> can be included in short-term roadmap. They are necessary, but
> >>> they
> >>>>>>>>> are not required right now. In my view, it would be reasonable to
> >>>>>>>>> schedule them on long-term roadmap.
> >>>>>>>>>
> >>>>>>>>> Warm regards,
> >>>>>>>>> Hyunsik
> >>>>>>>>>
> >>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]
> >>>>
> >>>>>> wrote:
> >>>>>>>>>> Hi Hyunsik,
> >>>>>>>>>> I'm very glad that we can release the next version, soon.
> >>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
> >>>>>>>>>>
> >>>>>>>>>> Addition to the aforementioned features, I have the two
> >>>>> suggestions.
> >>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I
> >>>>>> started it
> >>>>>>>>>> quite a long time ago, but it is delayed due to the lower
> >>>> priority
> >>>>>> than
> >>>>>>>>>> other stability issues. But, since this operator is widely used
> >>>> in
> >>>>>>>>> analytic
> >>>>>>>>>> applications, we need to add this feature as soon as possible.
> >>>> So,
> >>>>>> in my
> >>>>>>>>>> opinion, it would be good to add this feature to the next
> >>>> roadmap.
> >>>>>>>>>>
> >>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue
> >>>> for
> >>>>>>>>> making
> >>>>>>>>>> the query plan more flexible. After that, we can employ the
> >>>> plenty
> >>>>>>>>>> optimization opportunities like described in TAJO-161.
> >>>>>>>>>>
> >>>>>>>>>> How do you guys think about these issues?
> >>>>>>>>>>
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Jihoon
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>:
> >>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> I'm very happy to see that our community is growing! Also,
> >>> It's
> >>>> a
> >>>>>>>>> pleasure
> >>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested
> >>> various
> >>>>>>>>> features
> >>>>>>>>>>> in various contexts, and tried to figure out if there are any
> >>>>>> critical
> >>>>>>>>>>> problems. I think that there are only a few issues and we can
> >>>>>> release
> >>>>>>>>> 0.8.0
> >>>>>>>>>>> next week. If there are further issues to be solved before the
> >>>>> 0.8.0
> >>>>>>>>>>> release, feel free to suggest ideas.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any
> >>>>>>>>> suggestion
> >>>>>>>>>>> from users, contributors, and committers. Please fire away!
> >>>>>>>>>>>
> >>>>>>>>>>> I'm thinking that our next stage should focus on improving the
> >>>> way
> >>>>>> Tajo
> >>>>>>>>>>> runs in thousands of large cluster nodes and for a number of
> >>>>>> concurrent
> >>>>>>>>>>> users. The key issues associated with this include the
> >>>> following:
> >>>>>>>>>>>
> >>>>>>>>>>> * High availability
> >>>>>>>>>>> * Multi-tenancy scheduling
> >>>>>>>>>>> * More stability
> >>>>>>>>>>> * Improved shuffle
> >>>>>>>>>>>
> >>>>>>>>>>> The current work status is as follows. Min is working on
> >>> Tajo's
> >>>>> new
> >>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As
> >>> far
> >>>>> as I
> >>>>>>>>> know,
> >>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
> >>>>>> including
> >>>>>>>>>>> myself are investigating and solving the issues which occur in
> >>>>> large
> >>>>>>>>>>> clusters. These issues should be solved in order to make Tajo
> >>> a
> >>>>>> complete
> >>>>>>>>>>> enterprise-ready production.
> >>>>>>>>>>>
> >>>>>>>>>>> In addition, there are some SQL feature support issues. Many
> >>>>>> analytic
> >>>>>>>>>>> problems require window functions. Also, in-subquery and
> >>> scalar
> >>>>>> subquery
> >>>>>>>>>>> should be supported. So, I'd like to schedule them with high
> >>>>>> priority.
> >>>>>>>>> In
> >>>>>>>>>>> my view, there will be very few SQL support issues if Tajo
> >>>>> provides
> >>>>>>>>> these
> >>>>>>>>>>> features.
> >>>>>>>>>>>
> >>>>>>>>>>> Besides those areas, David is working on a nested schema and
> >>> its
> >>>>>> related
> >>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because
> >>> it
> >>>>>>>>> requires a
> >>>>>>>>>>> lot of hard work. So, it would be great to schedule the nested
> >>>>>> schema
> >>>>>>>>>>> loosely. That's just my thoughts, anyhow.
> >>>>>>>>>>>
> >>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest
> >>>> that
> >>>>>> we
> >>>>>>>>> need
> >>>>>>>>>>> to release more frequently after the 0.8.0 release. So far,
> >>>> there
> >>>>>> has
> >>>>>>>>> been
> >>>>>>>>>>> a long period between each release because Tajo is undergoing
> >>>>> heavy
> >>>>>>>>>>> development. By 'releasing early, releasing often', we will
> >>> make
> >>>>>> more
> >>>>>>>>>>> tighter feedback loop between users and developers.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that there are many additional many interesting issues
> >>>> to
> >>>>> be
> >>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We
> >>> will
> >>>>>> arrange
> >>>>>>>>>>> our short-term roadmap and long-term roadmap based on your
> >>>>>> suggestions.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you all so much for your contribution!
> >>>>>>>>>>>
> >>>>>>>>>>> Warm Regards,
> >>>>>>>>>>> Hyunsik
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Tajo - Big Data Warehouse System on Hadoop
> >>>>>>>> http://tajo.apache.org/
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> My research interests are distributed systems, parallel computing and
> >>>>> bytecode based virtual machine.
> >>>>>
> >>>>> My profile:
> >>>>> http://www.linkedin.com/in/coderplay
> >>>>> My blog:
> >>>>> http://coderplay.javaeye.com
> >>>>>
> >>>>
> >>>
> >
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to