[Hadoop Wiki] Update of "Hive/Roadmap" by NamitJain

Apache Wiki Fri, 17 Apr 2009 18:06:02 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by NamitJain:
http://wiki.apache.org/hadoop/Hive/Roadmap

------------------------------------------------------------------------------
  
  Before adding to the list below, please check 
[https://issues.apache.org/jira/browse/HADOOP/component/12312455 JIRA] to see 
if a ticket has already been opened for the feature. If not, please open a 
ticket on the [http://issues.apache.org/jira/browse/HADOOP Hadoop JIRA] and 
select "contrib/hive" as the component and also update the following list.
  
- = 10/27/08 Roadmap Update =
- 
-  1. Integrating Dynamic SerDe with the DDL. (Zheng/Pete) - This allows the 
users to create typed tables along with list and map types from the DDL
-  2. Support for Statistics. (Ashish) - These stats are needed to make 
optimization decisions
-  3. Join Optimizations. (Prasad) - Mapside joins, semi join techniques etc to 
do the join faster
-  4. Predicate Pushdown Optimizations. (Namit) - pushing predicates just above 
the table scan for certain situations in joins as well as ensuring that only 
required columns are sent across map/reduce boundaries
-  5. Group By Optimizations. (Joydeep) - various optimizations to make group 
by faster
-  6. Optimizations to reduce the number of map files created by filter 
operations. (Dhrubha) - Filters with a large number of mappers produces a lot 
of files which slows down the following operations. This tries to address 
problems with that.
-  7. Transformations in LOAD. (Joydeep) - LOAD currently does not transform 
the input data if it is not in the format expected by the destination table.
-  8. Schemaless map/reduce. (Zheng) - TRANSFORM needs schema while map/reduce 
is schema less.
-  9. Improvements to TRANSFORM. (Zheng) - Make this more intuitive to 
map/reduce developers - evaluate some other keywords etc..
-  10. Error Reporting Improvements. (Pete) - Make error reporting for parse 
errors better
-  11. Help on CLI. (Joydeep) - add help to the CLI
-  12. Explode and Collect Operators. (Zheng) - Explode and collect operators 
to convert collections to individual items and vice versa.
-  13. Propagating sort properties to destination tables. (Prasad) - If the 
query produces sorted we want to capture that in the destination table's 
metadata so that downstream optimizations can be enabled.
- 
- Other contributions from outside FB ...
-  14. JDBC driver (Michi Mutsuzaki @ stanford.edu, Raghu @ stanford.edu)
-  15. Fixes to CLI driver (Jeremy Huylebroeck)
-  16. Web interface...
  
  = Roadmap/call to add more features =
  The following is the list of useful features that are on the Hive Roadmap:
+   * HAVING clause support
+   * Support for various statistical functions like Median, Standard 
Deviation, Variance etc.
+   * Support for Create Table as Select
+   * Support for views
+   * Support for Insert Appends
+   * Support for Inserts without listing the partitioning columns explicitly - 
the query should be able to derive that
+   * Support for Indexes
+   * Support for IN
+   * Support for Column Alias
+   * Support for Statistics. - These stats are needed to make optimization 
decisions
+   * Join Optimizations. - Mapside joins, semi join techniques etc to do the 
join faster
+   * Optimizations to reduce the number of map files created by filter 
operations.
+   * Transformations in LOAD. - LOAD currently does not transform the input 
data if it is not in the format expected by the destination table.
+   * Schemaless map/reduce.  - TRANSFORM needs schema while map/reduce is 
schema less.
+   * Improvements to TRANSFORM.  - Make this more intuitive to map/reduce 
developers - evaluate some other keywords etc..
+   * Error Reporting Improvements.  - Make error reporting for parse errors 
better
+   * Help on CLI.  - add help to the CLI
+   * Explode and Collect Operators. - Explode and collect operators to convert 
collections to individual items and vice versa.
+   * Propagating sort properties to destination tables. - If the query 
produces sorted we want to capture that in the destination table's metadata so 
that downstream optimizations can be enabled. 
+   * Propagating bucketing properties to destination tables.
    * Multiple group-by inserts
      * Generate multiple group-by results by scanning the source table only 
once
      * Example:
@@ -39, +39 @@

    * Let the user register UDF and UDAF
      * Expose register functions in UDFRegistry and UDAFRegistry
      * Provide commands in HiveCli to call those register functions
-   * JDBC driver
+   * ODBC/JDBC driver
    * Alter table 
      * rename column
      * serde properties (delims, thrift classes)

[Hadoop Wiki] Update of "Hive/Roadmap" by NamitJain

Reply via email to