Author: gates
Date: Sat Apr 28 00:47:30 2012
New Revision: 1331643
URL: http://svn.apache.org/viewvc?rev=1331643&view=rev
Log:
HCATALOG-368 Documentation improvements: doc set & API docs
Modified:
incubator/hcatalog/trunk/CHANGES.txt
incubator/hcatalog/trunk/build.xml
incubator/hcatalog/trunk/src/docs/overview.html
incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
Modified: incubator/hcatalog/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/CHANGES.txt?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/CHANGES.txt (original)
+++ incubator/hcatalog/trunk/CHANGES.txt Sat Apr 28 00:47:30 2012
@@ -26,6 +26,8 @@ Trunk (unreleased changes)
HCAT-328 HCatLoader should report its input size so pig can estimate the
number of reducers (traviscrawford via gates)
IMPROVEMENTS
+ HCAT-368 Documentation improvements: doc set & API docs (lefty via gates)
+
HCAT-387 Trunk should point to 0.10 snapshot to match hive trunk (toffer)
HCAT-329 HCatalog build fails with pig 0.9 (traviscrawford via hashutosh)
Modified: incubator/hcatalog/trunk/build.xml
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/build.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/build.xml (original)
+++ incubator/hcatalog/trunk/build.xml Sat Apr 28 00:47:30 2012
@@ -471,6 +471,7 @@
author="true"
version="true"
use="true"
+ noqualifier="all"
windowtitle="HCatalog ${hcatalog.version} API"
doctitle="HCatalog ${hcatalog.version} API"
failonerror="true">
Modified: incubator/hcatalog/trunk/src/docs/overview.html
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/overview.html?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/overview.html (original)
+++ incubator/hcatalog/trunk/src/docs/overview.html Sat Apr 28 00:47:30 2012
@@ -52,54 +52,50 @@
<a name="HCatalog"></a>
<h2 class="h3">HCatalog </h2>
<div class="section">
-<p>HCatalog is a table management and storage management layer for Hadoop that
enables users with different data processing tools – Pig, MapReduce,
Hive, Streaming – to more easily read and write data on the grid.
HCatalog’s table abstraction presents users with a relational view of
data in the Hadoop distributed file system (HDFS) and ensures that users need
not worry about where or in what format their data is stored – RCFile
format, text files, sequence files. </p>
-<p>(Note: In this release, Streaming is not supported. Also, HCatalog supports
only writing RCFile formatted files and only reading PigStorage formated text
files.)</p>
+<p>HCatalog is a table and storage management layer for Hadoop that enables
users with different data processing tools – Pig, MapReduce, and Hive
– to more easily read and write data on the grid. HCatalog’s table
abstraction presents users with a relational view of data in the Hadoop
distributed file system (HDFS) and ensures that users need not worry about
where or in what format their data is stored – RCFile format, text files,
or SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe
can be written. By default, HCatalog supports RCFile, CSV, JSON, and
SequenceFile formats. To use a custom format, you must provide the InputFormat,
OutputFormat, and SerDe.</p>
<p></p>
-
-
-
+
+
<a name="HCatalog+Architecture"></a>
<h2 class="h3">HCatalog Architecture</h2>
<div class="section">
-<p>HCatalog is built on top of the Hive metastore and incorporates components
from the Hive DDL. HCatalog provides read and write interfaces for Pig and
MapReduce and a command line interface for data definitions.</p>
-<p>(Note: HCatalog notification is not available in this release.)</p>
+<p>HCatalog is built on top of the Hive metastore and incorporates Hive's DDL.
HCatalog provides read and write interfaces for Pig and MapReduce and uses
Hive's command line interface for issuing data definition and metadata
exploration commands.</p>
<p></p>
<a name="Interfaces"></a>
<h3 class="h4">Interfaces</h3>
-<p>The HCatalog interface for Pig – HCatLoader and HCatStorer – is
an implementation of the Pig load and store interfaces. HCatLoader accepts a
table to read data from; you can indicate which partitions to scan by
immediately following the load statement with a partition filter statement.
HCatStorer accepts a table to write to and a specification of partition keys to
create a new partition. Currently HCatStorer only supports writing to one
partition. HCatLoader and HCatStorer are implemented on top of HCatInputFormat
and HCatOutputFormat respectively </p>
-<p>The HCatalog interface for MapReduce – HCatInputFormat and
HCatOutputFormat – is an implementation of Hadoop InputFormat and
OutputFormat. HCatInputFormat accepts a table to read data from and a selection
predicate to indicate which partitions to scan. HCatOutputFormat accepts a
table to write to and a specification of partition keys to create a new
partition. Currently HCatOutputFormat only supports writing to one
partition.</p>
-<p>
-<strong>Note:</strong> Currently there is no Hive-specific interface. Since
HCatalog uses Hive's metastore, Hive can read data in HCatalog directly as long
as a SerDe for that data already exists. In the future we plan to write a
HCatalogSerDe so that users won't need storage-specific SerDes and so that Hive
users can write data to HCatalog. Currently, this is supported - if a Hive user
writes data in the RCFile format, it is possible to read the data through
HCatalog. </p>
-<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog
CLI supports most of the DDL portion of Hive's query language, allowing users
to create, alter, drop tables, etc. The CLI also supports the data exploration
part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, etc.</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which
implement the Pig load and store interfaces respectively. HCatLoader accepts a
table to read data from; you can indicate which partitions to scan by
immediately following the load statement with a partition filter statement.
HCatStorer accepts a table to write to and optionally a specification of
partition keys to create a new partition. You can write to a single partition
by specifying the partition key(s) and value(s) in the STORE clause; and you
can write to multiple partitions if the partition key(s) are columns in the
data being stored. HCatLoader is implemented on top of HCatInputFormat and
HCatStorer is implemented on top of HCatOutputFormat (see <a
href="loadstore.html">HCatalog Load and Store</a>).</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for
MapReduce; they implement Hadoop's InputFormat and OutputFormat, respectively.
HCatInputFormat accepts a table to read data from and optionally a selection
predicate to indicate which partitions to scan. HCatOutputFormat accepts a
table to write to and optionally a specification of partition keys to create a
new partition. You can write to a single partition by specifying the partition
key(s) and value(s) in the STORE clause; and you can write to multiple
partitions if the partition key(s) are columns in the data being stored. (See
<a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
+<p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's
metastore, Hive can read data in HCatalog directly.</p>
+<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog
CLI supports all Hive DDL that does not require MapReduce to execute, allowing
users to create, alter, drop tables, etc. (Unsupported Hive DDL includes
import/export, CREATE TABLE AS SELECT, ALTER TABLE options REBUILD and
CONCATENATE, and ANALYZE TABLE ... COMPUTE STATISTICS.) The CLI also supports
the data exploration part of the Hive command line, such as SHOW TABLES,
DESCRIBE TABLE, etc. (see the <a href="cli.html">HCatalog Command Line
Interface</a>).</p>
<a name="Data+Model"></a>
<h3 class="h4">Data Model</h3>
-<p>HCatalog presents a relational view of data in HDFS. Data is stored in
tables and these tables can be placed in databases. Tables can also be hash
partitioned on one or more keys; that is, for a given value of a key (or set of
keys) there will be one partition that contains all rows with that value (or
set of values). For example, if a table is partitioned on date and there are
three days of data in the table, there will be three partitions in the table.
New partitions can be added to a table, and partitions can be dropped from a
table. Partitioned tables have no partitions at create time. Unpartitioned
tables effectively have one default partition that must be created at table
creation time. There is no guaranteed read consistency when a partition is
dropped.</p>
-<p>Partitions contain records. Once a partition is created records cannot be
added to it, removed from it, or updated in it. (In the future some ability to
integrate changes to a partition will be added.) Partitions are
multi-dimensional and not hierarchical. Records are divided into columns.
Columns have a name and a datatype. HCatalog supports the same datatypes as
Hive. </p>
+<p>HCatalog presents a relational view of data. Data is stored in tables and
these tables can be placed in databases. Tables can also be hash partitioned on
one or more keys; that is, for a given value of a key (or set of keys) there
will be one partition that contains all rows with that value (or set of
values). For example, if a table is partitioned on date and there are three
days of data in the table, there will be three partitions in the table. New
partitions can be added to a table, and partitions can be dropped from a table.
Partitioned tables have no partitions at create time. Unpartitioned tables
effectively have one default partition that must be created at table creation
time. There is no guaranteed read consistency when a partition is dropped.</p>
+<p>Partitions contain records. Once a partition is created records cannot be
added to it, removed from it, or updated in it. Partitions are
multi-dimensional and not hierarchical. Records are divided into columns.
Columns have a name and a datatype. HCatalog supports the same datatypes as
Hive (see <a href="loadstore.html">HCatalog Load and Store</a>). </p>
</div>
<a name="Data+Flow+Example"></a>
<h2 class="h3">Data Flow Example</h2>
<div class="section">
-<p>This simple data flow example shows how HCatalog is used to move data from
the grid into a database.
- From the database, the data can then be analyzed using Hive.</p>
+<p>This simple data flow example shows how HCatalog can help grid users share
and access data.</p>
<p>
<strong>First</strong> Joe in data acquisition uses distcp to get data onto
the grid.</p>
<pre class="code">
hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data
-hcat "alter table rawevents add partition 20100819
hdfs://data/rawevents/20100819/data"
+hcat "alter table rawevents add partition (ds='20100819') location
'hdfs://data/rawevents/20100819/data'"
</pre>
<p>
<strong>Second</strong> Sally in data processing uses Pig to cleanse and
prepare the data.</p>
-<p>Without HCatalog, Sally must be manually informed by Joe that data is
available, or use Oozie and poll on HDFS.</p>
+<p>Without HCatalog, Sally must be manually informed by Joe when data is
available, or poll on HDFS.</p>
<pre class="code">
A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray,
…);
B = filter A by bot_finder(zeta) = 0;
…
store Z into 'data/processedevents/20100819/data';
</pre>
-<p>With HCatalog, Oozie will be notified by HCatalog data is available and can
then start the Pig job</p>
+<p>With HCatalog, HCatalog will send a JMS message that data is available. The
Pig job can then be started.</p>
<pre class="code">
A = load 'rawevents' using HCatLoader;
B = filter A by date = '20100819' and by bot_finder(zeta) = 0;
@@ -115,20 +111,20 @@ alter table processedevents add partitio
select advertiser_id, count(clicks)
from processedevents
where date = '20100819'
-group by adverstiser_id;
+group by advertiser_id;
</pre>
<p>With HCatalog, Robert does not need to modify the table structure.</p>
<pre class="code">
select advertiser_id, count(clicks)
from processedevents
where date = ‘20100819’
-group by adverstiser_id;
+group by advertiser_id;
</pre>
</div>
<div class="copyright">
Copyright ©
- 2011 <a href="http://www.apache.org/licenses/">The Apache Software
Foundation</a>
+ 2012 <a href="http://www.apache.org/licenses/">The Apache Software
Foundation</a>
</div>
</div>
</body>
Modified:
incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
(original)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
Sat Apr 28 00:47:30 2012
@@ -25,8 +25,8 @@
<section>
<title>HCatalog </title>
- <p>HCatalog is a table and storage management layer for Hadoop that
enables users with different data processing tools â Pig, MapReduce, and Hive
â to more easily read and write data on the grid. HCatalogâs table
abstraction presents users with a relational view of data in the Hadoop
distributed file system (HDFS) and ensures that users need not worry about
where or in what format their data is stored â RCFile format, text files, or
sequence files. </p>
-<p>HCatalog supports reading and writing files in any format for which a SerDe
can be written. By default, HCatalog supports RCFile, CSV, JSON, and sequence
file formats. To use a custom format, you must provide the InputFormat,
OutputFormat, and SerDe.</p>
+ <p>HCatalog is a table and storage management layer for Hadoop that
enables users with different data processing tools â Pig, MapReduce, and Hive
â to more easily read and write data on the grid. HCatalogâs table
abstraction presents users with a relational view of data in the Hadoop
distributed file system (HDFS) and ensures that users need not worry about
where or in what format their data is stored â RCFile format, text files, or
SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe
can be written. By default, HCatalog supports RCFile, CSV, JSON, and
SequenceFile formats. To use a custom format, you must provide the InputFormat,
OutputFormat, and SerDe.</p>
<p></p>
<figure src="images/hcat-product.jpg" align="left" alt="HCatalog Product"/>
@@ -36,16 +36,15 @@
<section>
<title>HCatalog Architecture</title>
- <p>HCatalog is built on top of the Hive metastore and incorporates
components from the Hive DDL. HCatalog provides read and write interfaces for
Pig and MapReduce and uses
- Hive's command line interface for issuing data definition and metadata
exploration commands.</p>
+ <p>HCatalog is built on top of the Hive metastore and incorporates
Hive's DDL. HCatalog provides read and write interfaces for Pig and MapReduce
and uses Hive's command line interface for issuing data definition and metadata
exploration commands.</p>
<p></p>
<section>
<title>Interfaces</title>
-<p>The HCatalog interface for Pig â HCatLoader and HCatStorer â is an
implementation of the Pig load and store interfaces. HCatLoader accepts a table
to read data from; you can indicate which partitions to scan by immediately
following the load statement with a partition filter statement. HCatStorer
accepts a table to write to and optionally a specification of partition keys to
create a new partition. You can write to a single partition by specifying the
partition key(s) and value(s) in the STORE clause; and you can write to
multiple partitions if the partition key(s) are columns in the data being
stored. HCatLoader and HCatStorer are implemented on top of HCatInputFormat and
HCatOutputFormat, respectively (see <a href="loadstore.html">HCatalog Load and
Store</a>).</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which
implement the Pig load and store interfaces respectively. HCatLoader accepts a
table to read data from; you can indicate which partitions to scan by
immediately following the load statement with a partition filter statement.
HCatStorer accepts a table to write to and optionally a specification of
partition keys to create a new partition. You can write to a single partition
by specifying the partition key(s) and value(s) in the STORE clause; and you
can write to multiple partitions if the partition key(s) are columns in the
data being stored. HCatLoader is implemented on top of HCatInputFormat and
HCatStorer is implemented on top of HCatOutputFormat (see <a
href="loadstore.html">HCatalog Load and Store</a>).</p>
-<p>The HCatalog interface for MapReduce â HCatInputFormat and
HCatOutputFormat â is an implementation of Hadoop InputFormat and
OutputFormat. HCatInputFormat accepts a table to read data from and optionally
a selection predicate to indicate which partitions to scan. HCatOutputFormat
accepts a table to write to and optionally a specification of partition keys to
create a new partition. You can write to a single partition by specifying the
partition key(s) and value(s) in the STORE clause; and you can write to
multiple partitions if the partition key(s) are columns in the data being
stored. (See <a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for
MapReduce; they implement Hadoop's InputFormat and OutputFormat, respectively.
HCatInputFormat accepts a table to read data from and optionally a selection
predicate to indicate which partitions to scan. HCatOutputFormat accepts a
table to write to and optionally a specification of partition keys to create a
new partition. You can write to a single partition by specifying the partition
key(s) and value(s) in the STORE clause; and you can write to multiple
partitions if the partition key(s) are columns in the data being stored. (See
<a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
<p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's
metastore, Hive can read data in HCatalog directly.</p>
Modified:
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
---
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
(original)
+++
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
Sat Apr 28 00:47:30 2012
@@ -33,9 +33,9 @@ import org.apache.hcatalog.data.transfer
public class DataTransferFactory {
/**
- * This should be called once from master node to obtain an instance of
{@link HCatReader}
- * @param re built using {@link ReadEntity.Builder}
- * @param config Any configuration which master node wants to pass to
HCatalog
+ * This should be called once from master node to obtain an instance of
{@link HCatReader}.
+ * @param re ReadEntity built using {@link ReadEntity.Builder}
+ * @param config any configuration which master node wants to pass to
HCatalog
* @return {@link HCatReader}
*/
public static HCatReader getHCatReader(final ReadEntity re, final
Map<String,String> config) {
@@ -44,9 +44,9 @@ public class DataTransferFactory {
}
/**
- * This should only be called once from every slave nodes to obtain an
instance of {@link HCatReader}
- * @param split obtained at master node.
- * @param config obtained at master node.
+ * This should only be called once from every slave node to obtain an
instance of {@link HCatReader}.
+ * @param split input split obtained at master node
+ * @param config configuration obtained at master node
* @return {@link HCatReader}
*/
public static HCatReader getHCatReader(final InputSplit split, final
Configuration config) {
@@ -55,11 +55,11 @@ public class DataTransferFactory {
}
/**
- * This should only be called once from every slave nodes to obtain an
instance of {@link HCatReader}
- * This should be called if external system has some state to provide
to HCatalog
- * @param split obtained at master node.
- * @param config obtained at master node.
- * @param sp
+ * This should only be called once from every slave node to obtain an
instance of {@link HCatReader}.
+ * This should be called if an external system has some state to
provide to HCatalog.
+ * @param split input split obtained at master node
+ * @param config configuration obtained at master node
+ * @param sp {@link StateProvider}
* @return {@link HCatReader}
*/
public static HCatReader getHCatReader(final InputSplit split, final
Configuration config, StateProvider sp) {
@@ -67,9 +67,9 @@ public class DataTransferFactory {
return new HCatInputFormatReader(split, config, sp);
}
- /** This should be called at master node to obtain an instance of
{@link HCatWriter}
- * @param we built using {@link WriteEntity.Builder}
- * @param config Any configuration which master wants to pass to
HCatalog
+ /** This should be called at master node to obtain an instance of
{@link HCatWriter}.
+ * @param we WriteEntity built using {@link WriteEntity.Builder}
+ * @param config any configuration which master wants to pass to
HCatalog
* @return {@link HCatWriter}
*/
public static HCatWriter getHCatWriter(final WriteEntity we, final
Map<String,String> config) {
@@ -77,8 +77,8 @@ public class DataTransferFactory {
return new HCatOutputFormatWriter(we, config);
}
- /** This should be called at slave nodes to obtain an instance of
{@link HCatWriter}
- * @param cntxt {@link WriterContext} obtained at master node.
+ /** This should be called at slave nodes to obtain an instance of
{@link HCatWriter}.
+ * @param cntxt {@link WriterContext} obtained at master node
* @return {@link HCatWriter}
*/
public static HCatWriter getHCatWriter(final WriterContext cntxt) {
@@ -86,10 +86,10 @@ public class DataTransferFactory {
return getHCatWriter(cntxt, DefaultStateProvider.get());
}
- /** This should be called at slave nodes to obtain an instance of
{@link HCatWriter}
- * If external system has some mechanism for providing state to
HCatalog, this constructor
+ /** This should be called at slave nodes to obtain an instance of
{@link HCatWriter}.
+ * If an external system has some mechanism for providing state to
HCatalog, this constructor
* can be used.
- * @param cntxt {@link WriterContext} obtained at master node.
+ * @param cntxt {@link WriterContext} obtained at master node
* @param sp {@link StateProvider}
* @return {@link HCatWriter}
*/
Modified:
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
---
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
(original)
+++
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
Sat Apr 28 00:47:30 2012
@@ -22,16 +22,16 @@ import java.io.IOException;
import org.apache.hadoop.mapreduce.Job;
-/** The InputFormat to use to read data from HCat */
+/** The InputFormat to use to read data from HCatalog. */
public class HCatInputFormat extends HCatBaseInputFormat {
/**
- * Set the input to use for the Job. This queries the metadata server with
- * the specified partition predicates, gets the matching partitions, puts
- * the information in the conf object. The inputInfo object is updated with
- * information needed in the client context
+ * Set the input information to use for the job. This queries the metadata
server
+ * with the specified partition predicates, gets the matching partitions,
and
+ * puts the information in the conf object. The inputInfo object is updated
+ * with information needed in the client context.
* @param job the job object
- * @param inputJobInfo the input info for table to read
+ * @param inputJobInfo the input information about the table to read
* @throws IOException the exception in communicating with the metadata
server
*/
public static void setInput(Job job,
Modified:
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
---
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
(original)
+++
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
Sat Apr 28 00:47:30 2012
@@ -51,8 +51,8 @@ import org.apache.hcatalog.common.HCatUt
import org.apache.hcatalog.data.HCatRecord;
import org.apache.hcatalog.data.schema.HCatSchema;
-/** The OutputFormat to use to write data to HCat. The key value is ignored and
- * and should be given as null. The value is the HCatRecord to write.*/
+/** The OutputFormat to use to write data to HCatalog. The key value is
ignored and
+ * should be given as null. The value is the HCatRecord to write.*/
public class HCatOutputFormat extends HCatBaseOutputFormat {
static final private Log LOG = LogFactory.getLog(HCatOutputFormat.class);
@@ -61,10 +61,11 @@ public class HCatOutputFormat extends HC
private static boolean harRequested;
/**
- * Set the info about the output to write for the Job. This queries the
metadata server
- * to find the StorageHandler to use for the table. Throws error if
partition is already published.
+ * Set the information about the output to write for the job. This queries
the metadata server
+ * to find the StorageHandler to use for the table. It throws an error if
the
+ * partition is already published.
* @param job the job object
- * @param outputJobInfo the table output info
+ * @param outputJobInfo the table output information for the job
* @throws IOException the exception in communicating with the metadata
server
*/
@SuppressWarnings("unchecked")
@@ -204,6 +205,7 @@ public class HCatOutputFormat extends HC
* table schema is used by default for the partition if this is not called.
* @param job the job object
* @param schema the schema for the data
+ * @throws IOException
*/
public static void setSchema(final Job job, final HCatSchema schema)
throws IOException {
@@ -214,11 +216,12 @@ public class HCatOutputFormat extends HC
}
/**
- * Get the record writer for the job. Uses the StorageHandler's default
OutputFormat
- * to get the record writer.
- * @param context the information about the current task.
- * @return a RecordWriter to write the output for the job.
+ * Get the record writer for the job. This uses the StorageHandler's
default
+ * OutputFormat to get the record writer.
+ * @param context the information about the current task
+ * @return a RecordWriter to write the output for the job
* @throws IOException
+ * @throws InterruptedException
*/
@Override
public RecordWriter<WritableComparable<?>, HCatRecord>
Modified:
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
---
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
(original)
+++
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
Sat Apr 28 00:47:30 2012
@@ -27,7 +27,7 @@ import org.apache.hadoop.hive.ql.plan.Ta
import org.apache.hcatalog.data.schema.HCatSchema;
import org.apache.hcatalog.mapreduce.HCatStorageHandler;
-/** The Class used to serialize the partition information read from the
metadata server that maps to a partition */
+/** The Class used to serialize the partition information read from the
metadata server that maps to a partition. */
public class PartInfo implements Serializable {
/** The serialization version */
@@ -63,6 +63,8 @@ public class PartInfo implements Seriali
* @param storageHandler the storage handler
* @param location the location
* @param hcatProperties hcat-specific properties at the partition
+ * @param jobProperties the job properties
+ * @param tableInfo the table information
*/
public PartInfo(HCatSchema partitionSchema, HCatStorageHandler
storageHandler,
String location, Properties hcatProperties,
@@ -116,8 +118,8 @@ public class PartInfo implements Seriali
}
/**
- * Gets the value of hcatProperties.
- * @return the hcatProperties
+ * Gets the input storage handler properties.
+ * @return HCat-specific properties set at the partition
*/
public Properties getInputStorageHandlerProperties() {
return hcatProperties;
@@ -147,10 +149,18 @@ public class PartInfo implements Seriali
return partitionValues;
}
+ /**
+ * Gets the job properties.
+ * @return a map of the job properties
+ */
public Map<String,String> getJobProperties() {
return jobProperties;
}
+ /**
+ * Gets the HCatalog table information.
+ * @return the table information
+ */
public HCatTableInfo getTableInfo() {
return tableInfo;
}
Modified:
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
URL:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
---
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
(original)
+++
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
Sat Apr 28 00:47:30 2012
@@ -19,7 +19,7 @@ package org.apache.hcatalog.mapreduce;
import java.io.Serializable;
import java.util.Properties;
-/** Info about the storer to use for writing the data */
+/** Information about the storer to use for writing the data. */
public class StorerInfo implements Serializable {
/** The serialization version */
@@ -37,12 +37,12 @@ public class StorerInfo implements Seria
private String storageHandlerClass;
/**
- * Initialize the storer info
- * @param ifClass
- * @param ofClass
- * @param serdeClass
- * @param storageHandlerClass
- * @param properties
+ * Initialize the storer information.
+ * @param ifClass the input format class
+ * @param ofClass the output format class
+ * @param serdeClass the SerDe class
+ * @param storageHandlerClass the storage handler class
+ * @param properties the properties for the storage handler
*/
public StorerInfo(String ifClass, String ofClass, String serdeClass,
String storageHandlerClass, Properties properties) {
super();
@@ -53,35 +53,50 @@ public class StorerInfo implements Seria
this.properties = properties;
}
+ /**
+ * @return the input format class
+ */
public String getIfClass() {
return ifClass;
}
+ /**
+ * @param ifClass the input format class
+ */
public void setIfClass(String ifClass) {
this.ifClass = ifClass;
}
+ /**
+ * @return the output format class
+ */
public String getOfClass() {
return ofClass;
}
+ /**
+ * @return the serdeClass
+ */
public String getSerdeClass() {
return serdeClass;
}
+ /**
+ * @return the storageHandlerClass
+ */
public String getStorageHandlerClass() {
return storageHandlerClass;
}
/**
- * @return the properties
+ * @return the storer properties
*/
public Properties getProperties() {
return properties;
}
/**
- * @param properties the properties to set
+ * @param properties the storer properties to set
*/
public void setProperties(Properties properties) {
this.properties = properties;