Re: [Tajo Wiki] Update of "Roadmap" by HyunsikChoi

Hyunsik Choi Wed, 27 Mar 2013 22:56:05 -0700

Hi Chris,

The idea looks very cool. I think that the idea is possible. What can I do
for this? Now, I'll investigate Gora in more detail.


Thanks,
Hyunsik




On Thu, Mar 28, 2013 at 1:06 PM, Mattmann, Chris A (388J) <
[email protected]> wrote:

> Thanks Hyunsik!
>
> I wonder if we can make a connection with Tajo to Gora on this project.
> Maybe we can generate a Gora based front-end to Tajo?
>
> I'm CC'ing the Gora folks here for thoughts. Great roadmap!
>
> Cheers,
> Chris
>
> On 3/26/13 2:28 AM, "Apache Wiki" <[email protected]> wrote:
>
> >Dear Wiki user,
> >
> >You have subscribed to a wiki page or wiki category on "Tajo Wiki" for
> >change notification.
> >
> >The "Roadmap" page has been changed by HyunsikChoi:
> >http://wiki.apache.org/tajo/Roadmap
> >
> >Comment:
> >Moved the roadmap from github wiki.
> >
> >New page:
> >= Roadmap =
> >
> >== Milestone ==
> > * 0.2 - first release as an incubating project focused on ASF compliance
> > * 0.3 - more stable API and robust features and a rudimentary cost-based
> >optimizer
> > * 0.4 - more SQL supports and more improved cost-based optimizer
> > * 0.5 - a native columnar execution engine
> >
> >== Long Term Plan ==
> > * Integration with Hadoop ecosystem
> >  * Tajo catalog needs to support HCatalog or needs to be compatible to
> >Hive meta.
> > * The native columnar execution engine
> > * Cost-based optimization which also includes a rewrite rule engine and
> >various rewrite rules
> >
> >== Short/Mid Term Plan ==
> > * Improvement of the DAG framework
> >  * Query is both FSM and a DAG representation.
> >    * It would be good to separate Query to a FSM part and a DAG part.
> >  * We need easier interface to edit and build DAGs.
> > * RCFile
> >  * In the current implementation, RCFile is not compatible to Hive's one
> >because Tajo's RCFile uses Datum to (de)serialize data. So, we will have
> >additional RCFile wrapper class compatible to Hive's files.
> > * ORCFile
> >  * It looks promising. We need to port ORCFile.
> > * Trevni
> >  * TrevniScanner works well in most cases. However, it doesn't support
> >null value. We need to handle it.
> > *  hadoop security in tajo-rpc
> >  * tajo-rpc does not support hadoop security. Since Tajo will be a part
> >of Hadoop ecosystem, we need to apply hadoop security to tajo-rpc.
> > * Intermediate Data Format
> >  * As I mentioned above,  Tajo uses CSV as the intermediatee data
> >format. It may cause CPU overhead and is relatively large to be
> >transmitted via networks. We need to change it.
> > * JDBC/ODBC drivers
> >  * Tajo is a relational DW system. If we have such connectors, it can be
> >easily integrated with existing BI and OLAP tools.
> > * Restful API
> >  * It's very useful for web-based applications.
> > * Proper resource allocation for SubQuery (i.e., Execution Block in PPT)
> >    * SubQuery is one step of multiple query steps. For each subquery,
> >QueryMaster launches TaskRunners via Yarn, and the launched TaskRunners
> >are reused within a subquery.
> >    * Now, QueryMaster assigns the fixed-sized resource (2G memory) to
> >subqueries regardless of necessary resource. We need to improve it to
> >allocate proper resources to subqueries. For example, QueryMaster assigns
> >1G to one subquery for only scan or assigns 2G to another subquery
> >including joins.
> > * Error handling of TajoCli
> >   * TajoCli is a command line interface that uses Jline2. However, its
> >error handling is awful. It frequently halts when trivial exceptions
> >onccur.
> > * SQL data types
> >   * Currently, Tajo provides data types (i.e., byte, bool, int, long,
> >float, double, bytes, and string) based on Java primitive types. Tajo
> >should support SQL standard data types.
> > * Local mode
> >   *  Queries are always executed in a distributed mode. In other words,
> >it always uses Yarn. However, it is inconvenience for debugging and is
> >inefficient in single machine. We need to implement something for local
> >mode.
> > * Parallel launch of containers
> >   * Currently, node containers are executed sequentially (see
> >TaskRunnerLauncherImpl.java). It looks very inefficient. We can improve
> >it by using ExecutorService.
> > * Output commit
> >   * In some cases, Tajo is fault tolerance. It requires output commit
> >mechanism. However, Tajo does not support it, and we need this feature.
> > * Broadcast join and Limit operator
> >   * As I mentioned before, they are disabled after Yarn port. We should
> >enable them.
> > * HbaseScanner/Appender
> >   * Hbase will be a great storage for Tajo.
>
>

Re: [Tajo Wiki] Update of "Roadmap" by HyunsikChoi

Reply via email to