Re: [DISCUSS] 0.8.0 release and next roadmap

Hyunsik Choi Wed, 23 Apr 2014 23:28:24 -0700

Hi folks,

I created a list of features for 0.9 release. I just putted on my
desired features on the list. If you have your interesting issues,
feel free to put them on the list. The roadmap would be just a
recommendation. We can change them according to the situation. 0.9.0
is a major release. I'm expecting that we can release 0.9 after two
months. Also, I welcome to any suggestions.


Warm regards,
Hyunsik

On Thu, Apr 24, 2014 at 3:20 PM, Hyunsik Choi <[email protected]> wrote:
> Hi Eli,
>
> Thank you for comment. I'm also really hoping that you can have time
> to contribute open source projects. Especially, since you are very
> skilled in Yarn, your contribution would be great help to us :).
>
> Thanks,
> Hyunsik
>
> On Sun, Apr 20, 2014 at 3:07 AM, Eli Reisman <[email protected]> wrote:
>> Great discussion everyone, sorry to have missed so much of it. I will
>> certainly keep an eye on the YARN support angle and would love to help.
>>
>> I am hoping now that my team is growing at work I will have time to dive
>> back into my open source projects. I agree that YARN (and Mesos) support
>> will be a huge plus.
>>
>>
>>
>> On Mon, Apr 14, 2014 at 11:42 PM, Hyunsik Choi <[email protected]> wrote:
>>
>>> As David mentioned, the version 1.0 usually has special meanings like GA.
>>> When we are confident with the stability and features of Tajo, we can use
>>> 1.0. Thank you all guys again!
>>>
>>>
>>> On Tue, Apr 15, 2014 at 2:55 PM, Hyunsik Choi <[email protected]> wrote:
>>>
>>> > Thank you for votes! Let's go ahead!
>>> >
>>> > Cheers,
>>> > Hyunsik
>>> >
>>> >
>>> > On Tue, Apr 15, 2014 at 9:03 AM, ktpark <[email protected]> wrote:
>>> >
>>> >> +1
>>> >>
>>> >> I agree with Hyunsik.
>>> >> Sorry for late reply.
>>> >>
>>> >> 2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성:
>>> >>
>>> >> > Until today realized that my reply haven't been sent.
>>> >> >
>>> >> > +1
>>> >> >
>>> >> > Totally agree with Hyunsik. 0.9 is more appropriate for the next
>>> >> release.
>>> >> >
>>> >> > Min
>>> >> >
>>> >> >
>>> >> > On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]>
>>> >> wrote:
>>> >> >
>>> >> >> +1
>>> >> >>
>>> >> >> I agree with Hyunsik as well. I think since 1.0 increments the major
>>> >> >> version number, it should be used for a particularly significant
>>> >> release. :)
>>> >> >>
>>> >> >> Thanks,
>>> >> >> David
>>> >> >>
>>> >> >>
>>> >> >> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]>
>>> wrote:
>>> >> >>
>>> >> >>> +1 Hyunsik.
>>> >> >>>
>>> >> >>> Thanks!
>>> >> >>> Warm Regards,
>>> >> >>> Alvin.
>>> >> >>>
>>> >> >>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
>>> >> >>>
>>> >> >>>> Hi folks,
>>> >> >>>>
>>> >> >>>> I'd like to discuss the next version number. In Jira, we have
>>> >> >> provisionally
>>> >> >>>> used 1.0, and we didn't decide the next major version. I propose
>>> 0.9
>>> >> as
>>> >> >> the
>>> >> >>>> next major version. What do you think about this?
>>> >> >>>>
>>> >> >>>> Regards,
>>> >> >>>> Hyunsik
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]
>>> >
>>> >> >> wrote:
>>> >> >>>>
>>> >> >>>>> Min, thanks for reminding us!
>>> >> >>>>> It's a mandatory issue.
>>> >> >>>>> We need to implement that feature ASAP.
>>> >> >>>>>
>>> >> >>>>> Thanks,
>>> >> >>>>> Jihoon
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>:
>>> >> >>>>>
>>> >> >>>>>> Min,
>>> >> >>>>>>
>>> >> >>>>>> Yes, you are right. I'm thinking it everyday, but I missed it.
>>> >> Thank
>>> >> >> you
>>> >> >>>>>> for reminding me. It would be achieved by modifying Query class
>>> to
>>> >> >>>>> execute
>>> >> >>>>>> independent execution blocks in parallel. I'll add it to the
>>> wiki.
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>> Hyunsik
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]>
>>> >> >> wrote:
>>> >> >>>>>>
>>> >> >>>>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will
>>> >> scan A
>>> >> >> at
>>> >> >>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it
>>> in
>>> >> >>>>>>> parallel, right?
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> Min
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <
>>> [email protected]
>>> >> >
>>> >> >>>>>> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> I've just updated the roadmap page. Please take a look at the
>>> >> >> section
>>> >> >>>>>>>> 'After 0.8.0'
>>> >> >>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>>> >> >>>>>>>>
>>> >> >>>>>>>> If there are missed or additional ideas, feel free to add them
>>> on
>>> >> >>>>> that
>>> >> >>>>>>>> page or suggest them here. After we discuss them more, we would
>>> >> >>>>> decide
>>> >> >>>>>>>> their priorities.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Best regards,
>>> >> >>>>>>>> Hyunsik
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <
>>> >> [email protected]>
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>> Hi Hyoungjun,
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we
>>> >> provide
>>> >> >>>>>>>>> users with some prepared benchmark environment, users can test
>>> >> Tajo
>>> >> >>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your
>>> >> >>>>>>>>> suggestion.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Regards,
>>> >> >>>>>>>>> Hyunsik
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]>
>>> wrote:
>>> >> >>>>>>>>>> Hi Hyunsik ,
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark
>>> script
>>> >> >>>>> like
>>> >> >>>>>>> hive
>>> >> >>>>>>>>>> and impala is more helpful to test.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> https://github.com/rxin/TPC-H-Hive
>>> >> >>>>>>>>>> https://github.com/cartershanklin/hive-testbench
>>> >> >>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Thanks!
>>> >> >>>>>>>>>> Hyoungjun
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]
>>> >:
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>> Hi Jihoon,
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I
>>> >> filed
>>> >> >>>>> it
>>> >> >>>>>>> on
>>> >> >>>>>>>> the
>>> >> >>>>>>>>>>> wiki.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization
>>> >> opportunities
>>> >> >>>>> to
>>> >> >>>>>>>>>>> logical planning and distributed query planning. But, I'm
>>> not
>>> >> >>>>> sure
>>> >> >>>>>> it
>>> >> >>>>>>>>>>> can be included in short-term roadmap. They are necessary,
>>> but
>>> >> >>>>> they
>>> >> >>>>>>>>>>> are not required right now. In my view, it would be
>>> >> reasonable to
>>> >> >>>>>>>>>>> schedule them on long-term roadmap.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Warm regards,
>>> >> >>>>>>>>>>> Hyunsik
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <
>>> >> [email protected]
>>> >> >>>>>>
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>>>>>> Hi Hyunsik,
>>> >> >>>>>>>>>>>> I'm very glad that we can release the next version, soon.
>>> >> >>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> Addition to the aforementioned features, I have the two
>>> >> >>>>>>> suggestions.
>>> >> >>>>>>>>>>>> First is the support of CUBE operator (TAJO-259).
>>> Acutally, I
>>> >> >>>>>>>> started it
>>> >> >>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower
>>> >> >>>>>> priority
>>> >> >>>>>>>> than
>>> >> >>>>>>>>>>>> other stability issues. But, since this operator is widely
>>> >> used
>>> >> >>>>>> in
>>> >> >>>>>>>>>>> analytic
>>> >> >>>>>>>>>>>> applications, we need to add this feature as soon as
>>> >> possible.
>>> >> >>>>>> So,
>>> >> >>>>>>>> in my
>>> >> >>>>>>>>>>>> opinion, it would be good to add this feature to the next
>>> >> >>>>>> roadmap.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an
>>> >> issue
>>> >> >>>>>> for
>>> >> >>>>>>>>>>> making
>>> >> >>>>>>>>>>>> the query plan more flexible. After that, we can employ the
>>> >> >>>>>> plenty
>>> >> >>>>>>>>>>>> optimization opportunities like described in TAJO-161.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> How do you guys think about these issues?
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> Best Regards,
>>> >> >>>>>>>>>>>> Jihoon
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <
>>> [email protected]
>>> >> >:
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Hi folks,
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> I'm very happy to see that our community is growing! Also,
>>> >> >>>>> It's
>>> >> >>>>>> a
>>> >> >>>>>>>>>>> pleasure
>>> >> >>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested
>>> >> >>>>> various
>>> >> >>>>>>>>>>> features
>>> >> >>>>>>>>>>>>> in various contexts, and tried to figure out if there are
>>> >> any
>>> >> >>>>>>>> critical
>>> >> >>>>>>>>>>>>> problems. I think that there are only a few issues and we
>>> >> can
>>> >> >>>>>>>> release
>>> >> >>>>>>>>>>> 0.8.0
>>> >> >>>>>>>>>>>>> next week. If there are further issues to be solved before
>>> >> the
>>> >> >>>>>>> 0.8.0
>>> >> >>>>>>>>>>>>> release, feel free to suggest ideas.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to
>>> >> any
>>> >> >>>>>>>>>>> suggestion
>>> >> >>>>>>>>>>>>> from users, contributors, and committers. Please fire
>>> away!
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> I'm thinking that our next stage should focus on improving
>>> >> the
>>> >> >>>>>> way
>>> >> >>>>>>>> Tajo
>>> >> >>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number
>>> of
>>> >> >>>>>>>> concurrent
>>> >> >>>>>>>>>>>>> users. The key issues associated with this include the
>>> >> >>>>>> following:
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> * High availability
>>> >> >>>>>>>>>>>>> * Multi-tenancy scheduling
>>> >> >>>>>>>>>>>>> * More stability
>>> >> >>>>>>>>>>>>> * Improved shuffle
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> The current work status is as follows. Min is working on
>>> >> >>>>> Tajo's
>>> >> >>>>>>> new
>>> >> >>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him.
>>> As
>>> >> >>>>> far
>>> >> >>>>>>> as I
>>> >> >>>>>>>>>>> know,
>>> >> >>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some
>>> >> guys
>>> >> >>>>>>>> including
>>> >> >>>>>>>>>>>>> myself are investigating and solving the issues which
>>> occur
>>> >> in
>>> >> >>>>>>> large
>>> >> >>>>>>>>>>>>> clusters. These issues should be solved in order to make
>>> >> Tajo
>>> >> >>>>> a
>>> >> >>>>>>>> complete
>>> >> >>>>>>>>>>>>> enterprise-ready production.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> In addition, there are some SQL feature support issues.
>>> Many
>>> >> >>>>>>>> analytic
>>> >> >>>>>>>>>>>>> problems require window functions. Also, in-subquery and
>>> >> >>>>> scalar
>>> >> >>>>>>>> subquery
>>> >> >>>>>>>>>>>>> should be supported. So, I'd like to schedule them with
>>> high
>>> >> >>>>>>>> priority.
>>> >> >>>>>>>>>>> In
>>> >> >>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo
>>> >> >>>>>>> provides
>>> >> >>>>>>>>>>> these
>>> >> >>>>>>>>>>>>> features.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Besides those areas, David is working on a nested schema
>>> and
>>> >> >>>>> its
>>> >> >>>>>>>> related
>>> >> >>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while
>>> >> because
>>> >> >>>>> it
>>> >> >>>>>>>>>>> requires a
>>> >> >>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the
>>> >> nested
>>> >> >>>>>>>> schema
>>> >> >>>>>>>>>>>>> loosely. That's just my thoughts, anyhow.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to
>>> >> suggest
>>> >> >>>>>> that
>>> >> >>>>>>>> we
>>> >> >>>>>>>>>>> need
>>> >> >>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So
>>> far,
>>> >> >>>>>> there
>>> >> >>>>>>>> has
>>> >> >>>>>>>>>>> been
>>> >> >>>>>>>>>>>>> a long period between each release because Tajo is
>>> >> undergoing
>>> >> >>>>>>> heavy
>>> >> >>>>>>>>>>>>> development. By 'releasing early, releasing often', we
>>> will
>>> >> >>>>> make
>>> >> >>>>>>>> more
>>> >> >>>>>>>>>>>>> tighter feedback loop between users and developers.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> I think that there are many additional many interesting
>>> >> issues
>>> >> >>>>>> to
>>> >> >>>>>>> be
>>> >> >>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea.
>>> We
>>> >> >>>>> will
>>> >> >>>>>>>> arrange
>>> >> >>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your
>>> >> >>>>>>>> suggestions.
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Thank you all so much for your contribution!
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>>> Warm Regards,
>>> >> >>>>>>>>>>>>> Hyunsik
>>> >> >>>>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> --
>>> >> >>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>>> >> >>>>>>>>>> http://tajo.apache.org/
>>> >> >>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> My research interests are distributed systems, parallel
>>> computing
>>> >> and
>>> >> >>>>>>> bytecode based virtual machine.
>>> >> >>>>>>>
>>> >> >>>>>>> My profile:
>>> >> >>>>>>> http://www.linkedin.com/in/coderplay
>>> >> >>>>>>> My blog:
>>> >> >>>>>>> http://coderplay.javaeye.com
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > My research interests are distributed systems, parallel computing and
>>> >> > bytecode based virtual machine.
>>> >> >
>>> >> > My profile:
>>> >> > http://www.linkedin.com/in/coderplay
>>> >> > My blog:
>>> >> > http://coderplay.javaeye.com
>>> >>
>>> >>
>>> >
>>>

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to