Hi Hyunsik,

Thank you very much for sharing the roadmap. I am very excited for the 0.8.0 
release and for the projects on the roadmap for future releases.

I agree with Min that Tajo on YARN will be an important project. I think there 
will be a good amount of work to not only have Tajo run on YARN but also run 
well on a YARN cluster co-resident with other YARN applications. I think 
through this effort, we will likely also find areas of for improving 
multi-tenancy in YARN as well since YARN is still relatively young and has not 
been battle-tested that much yet.

As you mentioned, one of the projects I would like to focus on is adding 
support for nested schemas and non-scalar types. This way, we would be able to 
take full advantage of columnar storage formats like Parquet, which is designed 
to work well with nested schemas. I understand that this will be a significant 
project, but I think it may be possible to divide up the work as I have done 
with the sub-tasks to TAJO-710 and push out support for each type incrementally 
across different releases.

Another area that I would like to learn some more about is partitioning. I have 
just begun to look at TAJO-283 and am still ramping up on some of the context 
and the current status of the effort, but I am interested in exploring the 
possibility of enabling smart dynamic partitioning based on the way a table is 
queried but avoiding some of the current problems of dynamic partitioning such 
as creating too many files. One possible approach that I am thinking about is 
the possibility of building indices that point to offsets within files. Anyway, 
this is still more of a research problem, but is one that I would like to 
explore.

Thanks,
David

On Apr 3, 2014, at 10:24 PM, Hyunsik Choi <[email protected]> wrote:

> Hi folks,
> 
> I'm very happy to see that our community is growing! Also, It's a pleasure
> to discuss the Tajo 0.8.0 release. Recently, I've tested various features
> in various contexts, and tried to figure out if there are any critical
> problems. I think that there are only a few issues and we can release 0.8.0
> next week. If there are further issues to be solved before the 0.8.0
> release, feel free to suggest ideas.
> 
> Also, I'd like to discuss our next roadmap. We are open to any suggestion
> from users, contributors, and committers. Please fire away!
> 
> I'm thinking that our next stage should focus on improving the way Tajo
> runs in thousands of large cluster nodes and for a number of concurrent
> users. The key issues associated with this include the following:
> 
> * High availability
> * Multi-tenancy scheduling
> * More stability
> * Improved shuffle
> 
> The current work status is as follows. Min is working on Tajo's new
> scheduler (TAJO-540) based on sparrow. I'll support him. As far as I know,
> Alvin is working on TajoMaster HA (TAJO-704). Also, some guys including
> myself are investigating and solving the issues which occur in large
> clusters. These issues should be solved in order to make Tajo a complete
> enterprise-ready production.
> 
> In addition, there are some SQL feature support issues. Many analytic
> problems require window functions. Also, in-subquery and scalar subquery
> should be supported. So, I'd like to schedule them with high priority. In
> my view, there will be very few SQL support issues if Tajo provides these
> features.
> 
> Besides those areas, David is working on a nested schema and its related
> work (TAJO-710). I guess this will take quite a while because it requires a
> lot of hard work. So, it would be great to schedule the nested schema
> loosely. That's just my thoughts, anyhow.
> 
> Aside from the discussion of our roadmap, I'd like to suggest that we need
> to release more frequently after the 0.8.0 release. So far, there has been
> a long period between each release because Tajo is undergoing heavy
> development. By 'releasing early, releasing often', we will make more
> tighter feedback loop between users and developers.
> 
> I think that there are many additional many interesting issues to be
> included in our roadmap. Feel free to suggest your idea. We will arrange
> our short-term roadmap and long-term roadmap based on your suggestions.
> 
> Thank you all so much for your contribution!
> 
> Warm Regards,
> Hyunsik

Reply via email to