Hi Chris, The idea looks very cool. I think that the idea is possible. What can I do for this? Now, I'll investigate Gora in more detail.
Thanks, Hyunsik On Thu, Mar 28, 2013 at 1:06 PM, Mattmann, Chris A (388J) < [email protected]> wrote: > Thanks Hyunsik! > > I wonder if we can make a connection with Tajo to Gora on this project. > Maybe we can generate a Gora based front-end to Tajo? > > I'm CC'ing the Gora folks here for thoughts. Great roadmap! > > Cheers, > Chris > > On 3/26/13 2:28 AM, "Apache Wiki" <[email protected]> wrote: > > >Dear Wiki user, > > > >You have subscribed to a wiki page or wiki category on "Tajo Wiki" for > >change notification. > > > >The "Roadmap" page has been changed by HyunsikChoi: > >http://wiki.apache.org/tajo/Roadmap > > > >Comment: > >Moved the roadmap from github wiki. > > > >New page: > >= Roadmap = > > > >== Milestone == > > * 0.2 - first release as an incubating project focused on ASF compliance > > * 0.3 - more stable API and robust features and a rudimentary cost-based > >optimizer > > * 0.4 - more SQL supports and more improved cost-based optimizer > > * 0.5 - a native columnar execution engine > > > >== Long Term Plan == > > * Integration with Hadoop ecosystem > > * Tajo catalog needs to support HCatalog or needs to be compatible to > >Hive meta. > > * The native columnar execution engine > > * Cost-based optimization which also includes a rewrite rule engine and > >various rewrite rules > > > >== Short/Mid Term Plan == > > * Improvement of the DAG framework > > * Query is both FSM and a DAG representation. > > * It would be good to separate Query to a FSM part and a DAG part. > > * We need easier interface to edit and build DAGs. > > * RCFile > > * In the current implementation, RCFile is not compatible to Hive's one > >because Tajo's RCFile uses Datum to (de)serialize data. So, we will have > >additional RCFile wrapper class compatible to Hive's files. > > * ORCFile > > * It looks promising. We need to port ORCFile. > > * Trevni > > * TrevniScanner works well in most cases. However, it doesn't support > >null value. We need to handle it. > > * hadoop security in tajo-rpc > > * tajo-rpc does not support hadoop security. Since Tajo will be a part > >of Hadoop ecosystem, we need to apply hadoop security to tajo-rpc. > > * Intermediate Data Format > > * As I mentioned above, Tajo uses CSV as the intermediatee data > >format. It may cause CPU overhead and is relatively large to be > >transmitted via networks. We need to change it. > > * JDBC/ODBC drivers > > * Tajo is a relational DW system. If we have such connectors, it can be > >easily integrated with existing BI and OLAP tools. > > * Restful API > > * It's very useful for web-based applications. > > * Proper resource allocation for SubQuery (i.e., Execution Block in PPT) > > * SubQuery is one step of multiple query steps. For each subquery, > >QueryMaster launches TaskRunners via Yarn, and the launched TaskRunners > >are reused within a subquery. > > * Now, QueryMaster assigns the fixed-sized resource (2G memory) to > >subqueries regardless of necessary resource. We need to improve it to > >allocate proper resources to subqueries. For example, QueryMaster assigns > >1G to one subquery for only scan or assigns 2G to another subquery > >including joins. > > * Error handling of TajoCli > > * TajoCli is a command line interface that uses Jline2. However, its > >error handling is awful. It frequently halts when trivial exceptions > >onccur. > > * SQL data types > > * Currently, Tajo provides data types (i.e., byte, bool, int, long, > >float, double, bytes, and string) based on Java primitive types. Tajo > >should support SQL standard data types. > > * Local mode > > * Queries are always executed in a distributed mode. In other words, > >it always uses Yarn. However, it is inconvenience for debugging and is > >inefficient in single machine. We need to implement something for local > >mode. > > * Parallel launch of containers > > * Currently, node containers are executed sequentially (see > >TaskRunnerLauncherImpl.java). It looks very inefficient. We can improve > >it by using ExecutorService. > > * Output commit > > * In some cases, Tajo is fault tolerance. It requires output commit > >mechanism. However, Tajo does not support it, and we need this feature. > > * Broadcast join and Limit operator > > * As I mentioned before, they are disabled after Yarn port. We should > >enable them. > > * HbaseScanner/Appender > > * Hbase will be a great storage for Tajo. > >
