Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AshishThusoo: http://wiki.apache.org/hadoop/Hive/DeveloperGuide The comment on the change is: Start filling out the developer guide. ------------------------------------------------------------------------------ = Developer Guide = == Code Organization and a brief architecture == === Introduction === + Hive comprises of 3 main components: + * Serializers/Deserializers (trunk/serde) - This component has the framework libraries that allow users to develop serializers and deserializers for their own data formats. This component also contains some builtin serialization/deserialization families. + * MetaStore (trunk/metastore) - This component implements the metadata server which is used to hold all the information about tables and partitions that are in the warehouse. + * Query Processor (trunk/ql) - This component implements the processing framework for converting SQL to a graph of map/reduce jobs and also the execution time framework to run those jobs in the order of dependencies. + + Apart from these major components, Hive also contains a number of other components. These are as follows: + * Command Line Interface (trunk/cli) - This component has all the java code used by the Hive command line interface. + * Hive Server (trunk/service) - This component implements all the APIs that can be used by other clients (such as JDBC drivers) to talk to Hive. + * Common (trunk/common) - This component contains common infrastructure needed by the rest of the code. Currently, this contains all the java sources for managing and passing Hive configurations(HiveConf) to all the other code components. + * Ant Utilities (trunk/ant) - This component contains the implementation of some ant tasks that are used by the build infrastructure. + * Scripts (trunk/bin) - This component contains all the scripts provided in the distribution including the scripts to run the Hive cli(bin/hive). + + The following top level directories contain helper libraries, packaged configuration files etc..: + * trunk/conf - This directory contains the packaged hive-default.xml and hive-site.xml. + * trunk/data - This directory contains some data sets and configurations used in the hive tests. + * trunk/ivy - This directory contains the ivy files used by the build infrastructure to manage dependencies on different hadoop versions. + * trunk/lib - This directory contains the run time libraries needed by Hive. + * trunk/testlibs - This directory contains the junit.jar used by the junit target in the build infrastructure. + * trunk/testutils (Deprecated) + === SerDe === === MetaStore === === Query Processor ===
