Re: [DISCUSS] 0.8.0 release and next roadmap

ktpark Mon, 14 Apr 2014 17:04:35 -0700

+1

I agree with Hyunsik.
Sorry for late reply.


2014. 4. 15., 오전 5:05, Min Zhou <[email protected]> 작성:

> Until today realized that my reply haven't been sent.
> 
> +1
> 
> Totally agree with Hyunsik. 0.9 is more appropriate for the next release.
> 
> Min
> 
> 
> On Mon, Apr 14, 2014 at 12:31 PM, David Chen <[email protected]> wrote:
> 
>> +1
>> 
>> I agree with Hyunsik as well. I think since 1.0 increments the major
>> version number, it should be used for a particularly significant release. :)
>> 
>> Thanks,
>> David
>> 
>> 
>> On Apr 13, 2014, at 7:51 PM, Alvin Henrick <[email protected]> wrote:
>> 
>>> +1 Hyunsik.
>>> 
>>> Thanks!
>>> Warm Regards,
>>> Alvin.
>>> 
>>> On Apr 11, 2014, at 8:30 AM, Hyunsik Choi wrote:
>>> 
>>>> Hi folks,
>>>> 
>>>> I'd like to discuss the next version number. In Jira, we have
>> provisionally
>>>> used 1.0, and we didn't decide the next major version. I propose 0.9 as
>> the
>>>> next major version. What do you think about this?
>>>> 
>>>> Regards,
>>>> Hyunsik
>>>> 
>>>> 
>>>> On Thu, Apr 10, 2014 at 11:05 AM, Jihoon Son <[email protected]>
>> wrote:
>>>> 
>>>>> Min, thanks for reminding us!
>>>>> It's a mandatory issue.
>>>>> We need to implement that feature ASAP.
>>>>> 
>>>>> Thanks,
>>>>> Jihoon
>>>>> 
>>>>> 
>>>>> 2014-04-10 3:19 GMT+09:00 Hyunsik Choi <[email protected]>:
>>>>> 
>>>>>> Min,
>>>>>> 
>>>>>> Yes, you are right. I'm thinking it everyday, but I missed it. Thank
>> you
>>>>>> for reminding me. It would be achieved by modifying Query class to
>>>>> execute
>>>>>> independent execution blocks in parallel. I'll add it to the wiki.
>>>>>> 
>>>>>> Thanks,
>>>>>> Hyunsik
>>>>>> 
>>>>>> 
>>>>>> On Thu, Apr 10, 2014 at 2:43 AM, Min Zhou <[email protected]>
>> wrote:
>>>>>> 
>>>>>>> Yeah.. Another issue,  seems a query like A join B. Tajo will scan A
>> at
>>>>>>> first stage, after that in the 2nd stage scan B. Doesn't run it in
>>>>>>> parallel, right?
>>>>>>> 
>>>>>>> 
>>>>>>> Min
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Apr 9, 2014 at 10:10 AM, Hyunsik Choi <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I've just updated the roadmap page. Please take a look at the
>> section
>>>>>>>> 'After 0.8.0'
>>>>>>>> https://cwiki.apache.org/confluence/display/TAJO/Tajo+Roadmap
>>>>>>>> 
>>>>>>>> If there are missed or additional ideas, feel free to add them on
>>>>> that
>>>>>>>> page or suggest them here. After we discuss them more, we would
>>>>> decide
>>>>>>>> their priorities.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Hyunsik
>>>>>>>> 
>>>>>>>> On Sat, Apr 5, 2014 at 12:06 AM, Hyunsik Choi <[email protected]>
>>>>>>> wrote:
>>>>>>>>> Hi Hyoungjun,
>>>>>>>>> 
>>>>>>>>> Yes, TPC-H and TPC-DS scripts for Tajo are necessary. If we provide
>>>>>>>>> users with some prepared benchmark environment, users can test Tajo
>>>>>>>>> easily. I'll file your idea on the wiki. Thank you for your
>>>>>>>>> suggestion.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Hyunsik
>>>>>>>>> 
>>>>>>>>> On Fri, Apr 4, 2014 at 11:48 PM, 김형준 <[email protected]> wrote:
>>>>>>>>>> Hi Hyunsik ,
>>>>>>>>>> 
>>>>>>>>>> I did benchmark test with TPC-H, TPC-DS data. Benchmark script
>>>>> like
>>>>>>> hive
>>>>>>>>>> and impala is more helpful to test.
>>>>>>>>>> 
>>>>>>>>>> https://github.com/rxin/TPC-H-Hive
>>>>>>>>>> https://github.com/cartershanklin/hive-testbench
>>>>>>>>>> https://github.com/cloudera/impala-tpcds-kit
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> Hyoungjun
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2014-04-04 23:40 GMT+09:00 Hyunsik Choi <[email protected]>:
>>>>>>>>>> 
>>>>>>>>>>> Hi Jihoon,
>>>>>>>>>>> 
>>>>>>>>>>> CUBE and ROLL-UP are key features for analytic problems. I filed
>>>>> it
>>>>>>> on
>>>>>>>> the
>>>>>>>>>>> wiki.
>>>>>>>>>>> 
>>>>>>>>>>> TAJO-266 and TAJO-161 will give more optimization opportunities
>>>>> to
>>>>>>>>>>> logical planning and distributed query planning. But, I'm not
>>>>> sure
>>>>>> it
>>>>>>>>>>> can be included in short-term roadmap. They are necessary, but
>>>>> they
>>>>>>>>>>> are not required right now. In my view, it would be reasonable to
>>>>>>>>>>> schedule them on long-term roadmap.
>>>>>>>>>>> 
>>>>>>>>>>> Warm regards,
>>>>>>>>>>> Hyunsik
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Apr 4, 2014 at 3:01 PM, Jihoon Son <[email protected]
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Hyunsik,
>>>>>>>>>>>> I'm very glad that we can release the next version, soon.
>>>>>>>>>>>> Also, appreciate for the guideline of the next roadmap.
>>>>>>>>>>>> 
>>>>>>>>>>>> Addition to the aforementioned features, I have the two
>>>>>>> suggestions.
>>>>>>>>>>>> First is the support of CUBE operator (TAJO-259). Acutally, I
>>>>>>>> started it
>>>>>>>>>>>> quite a long time ago, but it is delayed due to the lower
>>>>>> priority
>>>>>>>> than
>>>>>>>>>>>> other stability issues. But, since this operator is widely used
>>>>>> in
>>>>>>>>>>> analytic
>>>>>>>>>>>> applications, we need to add this feature as soon as possible.
>>>>>> So,
>>>>>>>> in my
>>>>>>>>>>>> opinion, it would be good to add this feature to the next
>>>>>> roadmap.
>>>>>>>>>>>> 
>>>>>>>>>>>> Second is the advanced query optimization. TAJO-266 is an issue
>>>>>> for
>>>>>>>>>>> making
>>>>>>>>>>>> the query plan more flexible. After that, we can employ the
>>>>>> plenty
>>>>>>>>>>>> optimization opportunities like described in TAJO-161.
>>>>>>>>>>>> 
>>>>>>>>>>>> How do you guys think about these issues?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Jihoon
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2014-04-04 14:24 GMT+09:00 Hyunsik Choi <[email protected]>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm very happy to see that our community is growing! Also,
>>>>> It's
>>>>>> a
>>>>>>>>>>> pleasure
>>>>>>>>>>>>> to discuss the Tajo 0.8.0 release. Recently, I've tested
>>>>> various
>>>>>>>>>>> features
>>>>>>>>>>>>> in various contexts, and tried to figure out if there are any
>>>>>>>> critical
>>>>>>>>>>>>> problems. I think that there are only a few issues and we can
>>>>>>>> release
>>>>>>>>>>> 0.8.0
>>>>>>>>>>>>> next week. If there are further issues to be solved before the
>>>>>>> 0.8.0
>>>>>>>>>>>>> release, feel free to suggest ideas.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, I'd like to discuss our next roadmap. We are open to any
>>>>>>>>>>> suggestion
>>>>>>>>>>>>> from users, contributors, and committers. Please fire away!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm thinking that our next stage should focus on improving the
>>>>>> way
>>>>>>>> Tajo
>>>>>>>>>>>>> runs in thousands of large cluster nodes and for a number of
>>>>>>>> concurrent
>>>>>>>>>>>>> users. The key issues associated with this include the
>>>>>> following:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * High availability
>>>>>>>>>>>>> * Multi-tenancy scheduling
>>>>>>>>>>>>> * More stability
>>>>>>>>>>>>> * Improved shuffle
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The current work status is as follows. Min is working on
>>>>> Tajo's
>>>>>>> new
>>>>>>>>>>>>> scheduler (TAJO-540) based on sparrow. I'll support him. As
>>>>> far
>>>>>>> as I
>>>>>>>>>>> know,
>>>>>>>>>>>>> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys
>>>>>>>> including
>>>>>>>>>>>>> myself are investigating and solving the issues which occur in
>>>>>>> large
>>>>>>>>>>>>> clusters. These issues should be solved in order to make Tajo
>>>>> a
>>>>>>>> complete
>>>>>>>>>>>>> enterprise-ready production.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In addition, there are some SQL feature support issues. Many
>>>>>>>> analytic
>>>>>>>>>>>>> problems require window functions. Also, in-subquery and
>>>>> scalar
>>>>>>>> subquery
>>>>>>>>>>>>> should be supported. So, I'd like to schedule them with high
>>>>>>>> priority.
>>>>>>>>>>> In
>>>>>>>>>>>>> my view, there will be very few SQL support issues if Tajo
>>>>>>> provides
>>>>>>>>>>> these
>>>>>>>>>>>>> features.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Besides those areas, David is working on a nested schema and
>>>>> its
>>>>>>>> related
>>>>>>>>>>>>> work (TAJO-710). I guess this will take quite a while because
>>>>> it
>>>>>>>>>>> requires a
>>>>>>>>>>>>> lot of hard work. So, it would be great to schedule the nested
>>>>>>>> schema
>>>>>>>>>>>>> loosely. That's just my thoughts, anyhow.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Aside from the discussion of our roadmap, I'd like to suggest
>>>>>> that
>>>>>>>> we
>>>>>>>>>>> need
>>>>>>>>>>>>> to release more frequently after the 0.8.0 release. So far,
>>>>>> there
>>>>>>>> has
>>>>>>>>>>> been
>>>>>>>>>>>>> a long period between each release because Tajo is undergoing
>>>>>>> heavy
>>>>>>>>>>>>> development. By 'releasing early, releasing often', we will
>>>>> make
>>>>>>>> more
>>>>>>>>>>>>> tighter feedback loop between users and developers.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that there are many additional many interesting issues
>>>>>> to
>>>>>>> be
>>>>>>>>>>>>> included in our roadmap. Feel free to suggest your idea. We
>>>>> will
>>>>>>>> arrange
>>>>>>>>>>>>> our short-term roadmap and long-term roadmap based on your
>>>>>>>> suggestions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you all so much for your contribution!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Warm Regards,
>>>>>>>>>>>>> Hyunsik
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Tajo - Big Data Warehouse System on Hadoop
>>>>>>>>>> http://tajo.apache.org/
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> My research interests are distributed systems, parallel computing and
>>>>>>> bytecode based virtual machine.
>>>>>>> 
>>>>>>> My profile:
>>>>>>> http://www.linkedin.com/in/coderplay
>>>>>>> My blog:
>>>>>>> http://coderplay.javaeye.com
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
> 
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com

Re: [DISCUSS] 0.8.0 release and next roadmap

Reply via email to