Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by RaghothamMurthy:
http://wiki.apache.org/hadoop/Hive/Design

New page:
== Hive Architecture ==
Figure \ref{fig:sys_arch} shows the major components of Hive and its 
interactions with Hadoop. As shown in that figure, the main components of Hive 
are: 
 * UI - The user interface for users to submit queries and other operations to 
the system. Currently the system has a command line interface and a web based 
GUI is being developed.
 * Driver - The component which receives the queries. This component implements 
the notion of session handles and provides execute and fetch APIs modeled on 
JDBC/ODBC interfaces.
 * Compiler - The component that parses the query, does semantic analysis on 
the different qurey blocks and query expressions and eventually generates an 
execution plan with the help of the table and partition metadata looked up from 
the metastore.
 * Metastore - The component that stores all the structure information of the 
various table and partitions in the warehouse including column and column type 
information, the serializers and deserializers necessary to read and write data 
and the corresponding hdfs files where the data is stored.
 * Execution Engine - The component which executes the execution plan created 
by the compiler. The plan is a DAG of stages. The execution engine manages the 
dependencies between these different stages of the plan and executes these 
stages on the appropriate system components.

Figure \ref{fig:sys_arch} also shows how a typical query flows through the 
system. The UI calls the execute interface to the Driver(step 1 in Figure 
\ref{fig:sys_arch}). The Driver creates a session handle for the query and 
sends the query to the compiler to generate an execution plan(step 2). The 
compiler gets the necessary metadata from the metastore(steps 3 and 4). This 
metadata is used to typecheck the expressions in the query tree as well as to 
prune partitions based on query predicates. The plan generated by the 
compiler(step 5) is a DAG of stages with each stage being either a map/reduce 
job, a metadata operation or an operations on hdfs. For map/reduce stages, the 
plan contains map operator trees(operator trees that are executed on the 
mappers) and a reduce operator tree(for operations that need reducers). The 
execution engines submits these stages to appropriate components(steps 6, 6.1, 
6.2 and 6.3 steps). In each task(mapper/reducer) the deserializer associated 
with
  the table or intermediate outputs is used to read the rows from hdfs files 
and these are passed through the associated operator tree. Once the output is 
generated, it is written to a temporary hdfs file though the serializer(this 
happens in the mapper in case the operation does not need a reduce). The 
temporary files are used to provide data to subsequent map/reduce stages of the 
plan. For DML operations the final temporary file is moved to the tables 
location. This scheme is used to ensure that dirty data is not read(file rename 
being an atomic operation in hdfs). For queries, the contents of the temporary 
file are read by the execution engine directly from hdfs as part of the fetch 
call from the Driver(steps 7, 8 and 9).


== Hive Data Model ==
Data in Hive is organized into:
 * Tables - These are analogous to Tables in Relational Databases. Tables can 
be filtered, projected, joined and unioned. Additionally all the data of a 
table is stored in a directory in hdfs. Hive also supports notion of external 
tables wherein a table can be created on prexisting files or directories in 
hdfs by providing the appropriate location to the table creation DDL. The rows 
in a table are organized into typed columns similar to Relational Databases.
 * Partitions - Each Table can have one or more partition keys which determine 
how the data is stored e.g. a table T with a date partition column ds had files 
with data for a particular date stored in the <table location>/ds=<date> 
directory in hdfs. Partitions allow the system to prune data to be inspected 
based on query predicates, e.g. a query that in interested in rows from T that 
satisfy the predicate T.ds = '2008-09-01' would only have to look at files in 
<table location>/ds=2008-09-01/ directory in hdfs.
 * Buckets - Data in each partition may in turn be divided into Buckets based 
on the hash of a column in the table. Each bucket is stored as a file in the 
partition directory. Bucketing allows the system to efficiently evaluate 
queries that depend on a sample of data (these are queries that use SAMPLE 
clause on the table).
\end{itemize}

Apart from primitive column types(integers, floating point numbers, generic 
strings, dates and booleans), Hive also supports arrays and maps. Additionally, 
users can compose their own types programatically from any of the primitives, 
collections or other user defined types. The typing system is closely tied to 
the serde(Serailization/Deserialization) and object inspector interfaces. User 
can create their own types by implementing their own object inspectors and 
using these object inspectors they can create their own serdes to serialize and 
deserialize their data into hdfs files). These two interfaces provide the 
necessary hooks to extend the capabilities of Hive when it comes to 
understanding other data formats and richer types. Builtin object inspectors 
like ListObjectInspector, StructObjectInspector and MapObjectInspector provide 
the necessary primitives to compose richer types in an extensible manner. For 
maps(associative arrays) and arrays useful builtin functions like si
 ze and index operators are provided. The dotted notation is used to navigate 
nested types e.g. a.b.c = 1 looks at field c of field b of type a and compares 
that with 1.

== Meta Store ==
=== Motivation ===
Meta Store store provides two important but often over looked features of a 
data warehouse: data abstraction and data discovery. Without the data 
abstractions provided in Hive, user has to provide information about data 
formats, exractors and loaders along with the query. In Hive, this information 
given during table creation and reused everytime the table is referenced. This 
is very similar to the traditional warehousing systems. The second 
functionality, data discovery, enables users to discover and explore relevant 
and specific data in the warehouse. Other tools can be built using this 
metadata to expose and possibly enhance the information about the data and its 
availability. Hive accomplishes both of these features by providing a metdata 
repository that is tightly integrated with the Hive query processing system so 
that data and metadata are in sync.

=== Metadata Objects ===
 * Database - is a namespace for tables. It can be used as an administrative 
unit in future. The database 'default' is used for tables with no user supplied 
database name.
 * Table - Metadata for table contains list of columns, owner, storage and 
SerDe information. It can also contain any user supplied key and value data. 
Storage information includes location of the underlying data, file inout and 
output formats and bucketing information. SerDe metadata includes the 
implementation class of serializer and deserializer and any supporting 
information required by the implementation. All of these information can be 
provided during the creation of table.
 * Partition - Each partition can have its own columns and SerDe and storage 
information. This facilitates schema changes without affecting older partitions.

=== Meta Store Architecture ===
Metastore is an object store with a database or file backed store. The database 
backed store is implemented using ORM solution\cite{jpox}. The prime motivation 
for storting this in a relational database is queriability of metad data. Some 
disadvantages of using a separate data store for metadata instead using HDFS 
are synchronization and scalability issues. Additionally there is no clear way 
to implement an object store on top of HDFS due to lack of random updates to 
files. Coupled with this and the advantages of queriability of relational store 
made our approach a sensible one.
Meta Store can be configured to be used in couple of ways: remote and embedded. 
In remote mode, meta store is a Thrift\cite{thrift} service. This mode is 
useful for non-Java clients. In embedded mode, Hive client directly connects to 
underlying meta store using JDBC. This mode is useful because it avoids another 
system that needs to be maintained and monitored. Both of these modes can 
co-exist.

Reply via email to