Re: [Tajo Wiki] Update of "Roadmap" by HyunsikChoi

Mattmann, Chris A (388J) Wed, 27 Mar 2013 21:07:13 -0700

Thanks Hyunsik!

I wonder if we can make a connection with Tajo to Gora on this project.
Maybe we can generate a Gora based front-end to Tajo?


I'm CC'ing the Gora folks here for thoughts. Great roadmap!

Cheers,
Chris

On 3/26/13 2:28 AM, "Apache Wiki" <[email protected]> wrote:

>Dear Wiki user,
>
>You have subscribed to a wiki page or wiki category on "Tajo Wiki" for
>change notification.
>
>The "Roadmap" page has been changed by HyunsikChoi:
>http://wiki.apache.org/tajo/Roadmap
>
>Comment:
>Moved the roadmap from github wiki.
>
>New page:
>= Roadmap =
>
>== Milestone ==
> * 0.2 - first release as an incubating project focused on ASF compliance
> * 0.3 - more stable API and robust features and a rudimentary cost-based
>optimizer
> * 0.4 - more SQL supports and more improved cost-based optimizer
> * 0.5 - a native columnar execution engine
>
>== Long Term Plan ==
> * Integration with Hadoop ecosystem
>  * Tajo catalog needs to support HCatalog or needs to be compatible to
>Hive meta.
> * The native columnar execution engine
> * Cost-based optimization which also includes a rewrite rule engine and
>various rewrite rules
>
>== Short/Mid Term Plan ==
> * Improvement of the DAG framework
>  * Query is both FSM and a DAG representation.
>    * It would be good to separate Query to a FSM part and a DAG part.
>  * We need easier interface to edit and build DAGs.
> * RCFile
>  * In the current implementation, RCFile is not compatible to Hive's one
>because Tajo's RCFile uses Datum to (de)serialize data. So, we will have
>additional RCFile wrapper class compatible to Hive's files.
> * ORCFile
>  * It looks promising. We need to port ORCFile.
> * Trevni
>  * TrevniScanner works well in most cases. However, it doesn't support
>null value. We need to handle it.
> *  hadoop security in tajo-rpc
>  * tajo-rpc does not support hadoop security. Since Tajo will be a part
>of Hadoop ecosystem, we need to apply hadoop security to tajo-rpc.
> * Intermediate Data Format
>  * As I mentioned above,  Tajo uses CSV as the intermediatee data
>format. It may cause CPU overhead and is relatively large to be
>transmitted via networks. We need to change it.
> * JDBC/ODBC drivers
>  * Tajo is a relational DW system. If we have such connectors, it can be
>easily integrated with existing BI and OLAP tools.
> * Restful API
>  * It's very useful for web-based applications.
> * Proper resource allocation for SubQuery (i.e., Execution Block in PPT)
>    * SubQuery is one step of multiple query steps. For each subquery,
>QueryMaster launches TaskRunners via Yarn, and the launched TaskRunners
>are reused within a subquery.
>    * Now, QueryMaster assigns the fixed-sized resource (2G memory) to
>subqueries regardless of necessary resource. We need to improve it to
>allocate proper resources to subqueries. For example, QueryMaster assigns
>1G to one subquery for only scan or assigns 2G to another subquery
>including joins. 
> * Error handling of TajoCli
>   * TajoCli is a command line interface that uses Jline2. However, its
>error handling is awful. It frequently halts when trivial exceptions
>onccur.
> * SQL data types
>   * Currently, Tajo provides data types (i.e., byte, bool, int, long,
>float, double, bytes, and string) based on Java primitive types. Tajo
>should support SQL standard data types.
> * Local mode
>   *  Queries are always executed in a distributed mode. In other words,
>it always uses Yarn. However, it is inconvenience for debugging and is
>inefficient in single machine. We need to implement something for local
>mode.
> * Parallel launch of containers
>   * Currently, node containers are executed sequentially (see
>TaskRunnerLauncherImpl.java). It looks very inefficient. We can improve
>it by using ExecutorService.
> * Output commit
>   * In some cases, Tajo is fault tolerance. It requires output commit
>mechanism. However, Tajo does not support it, and we need this feature.
> * Broadcast join and Limit operator
>   * As I mentioned before, they are disabled after Yarn port. We should
>enable them.
> * HbaseScanner/Appender
>   * Hbase will be a great storage for Tajo.

Re: [Tajo Wiki] Update of "Roadmap" by HyunsikChoi

Reply via email to