svn commit: r1331643 - in /incubator/hcatalog/trunk: ./ src/docs/ src/docs/src/documentation/content/xdocs/ src/java/org/apache/hcatalog/data/transfer/ src/java/org/apache/hcatalog/mapreduce/

gates Fri, 27 Apr 2012 17:48:04 -0700

Author: gates
Date: Sat Apr 28 00:47:30 2012
New Revision: 1331643

URL: http://svn.apache.org/viewvc?rev=1331643&view=rev
Log:
HCATALOG-368 Documentation improvements: doc set & API docs


Modified:
    incubator/hcatalog/trunk/CHANGES.txt
    incubator/hcatalog/trunk/build.xml
    incubator/hcatalog/trunk/src/docs/overview.html
    incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
    
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
    
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
    
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
    
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
    
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java

Modified: incubator/hcatalog/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/CHANGES.txt?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/CHANGES.txt (original)
+++ incubator/hcatalog/trunk/CHANGES.txt Sat Apr 28 00:47:30 2012
@@ -26,6 +26,8 @@ Trunk (unreleased changes)
   HCAT-328 HCatLoader should report its input size so pig can estimate the 
number of reducers (traviscrawford via gates)
 
   IMPROVEMENTS
+  HCAT-368 Documentation improvements: doc set & API docs (lefty via gates)
+
   HCAT-387 Trunk should point to 0.10 snapshot to match hive trunk (toffer)
 
   HCAT-329 HCatalog build fails with pig 0.9 (traviscrawford via hashutosh)

Modified: incubator/hcatalog/trunk/build.xml
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/build.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/build.xml (original)
+++ incubator/hcatalog/trunk/build.xml Sat Apr 28 00:47:30 2012
@@ -471,6 +471,7 @@
              author="true"
              version="true"
              use="true"
+             noqualifier="all"
              windowtitle="HCatalog ${hcatalog.version} API"
              doctitle="HCatalog ${hcatalog.version} API"
              failonerror="true">

Modified: incubator/hcatalog/trunk/src/docs/overview.html
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/overview.html?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/overview.html (original)
+++ incubator/hcatalog/trunk/src/docs/overview.html Sat Apr 28 00:47:30 2012
@@ -52,54 +52,50 @@
 <a name="HCatalog"></a>
 <h2 class="h3">HCatalog </h2>
 <div class="section">
-<p>HCatalog is a table management and storage management layer for Hadoop that 
enables users with different data processing tools &ndash; Pig, MapReduce, 
Hive, Streaming &ndash; to more easily read and write data on the grid. 
HCatalog&rsquo;s table abstraction presents users with a relational view of 
data in the Hadoop distributed file system (HDFS) and ensures that users need 
not worry about where or in what format their data is stored &ndash; RCFile 
format, text files, sequence files. </p>
-<p>(Note: In this release, Streaming is not supported. Also, HCatalog supports 
only writing RCFile formatted files and only reading PigStorage formated text 
files.)</p>
+<p>HCatalog is a table and storage management layer for Hadoop that enables 
users with different data processing tools &ndash; Pig, MapReduce, and Hive 
&ndash; to more easily read and write data on the grid. HCatalog&rsquo;s table 
abstraction presents users with a relational view of data in the Hadoop 
distributed file system (HDFS) and ensures that users need not worry about 
where or in what format their data is stored &ndash; RCFile format, text files, 
or SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe 
can be written. By default, HCatalog supports RCFile, CSV, JSON, and 
SequenceFile formats. To use a custom format, you must provide the InputFormat, 
OutputFormat, and SerDe.</p>
 <p></p>
-     
-      
-      
+
+  
 <a name="HCatalog+Architecture"></a>
 <h2 class="h3">HCatalog Architecture</h2>
 <div class="section">
-<p>HCatalog is built on top of the Hive metastore and incorporates components 
from the Hive DDL. HCatalog provides read and write interfaces for Pig and 
MapReduce and a command line interface for data definitions.</p>
-<p>(Note: HCatalog notification is not available in this release.)</p>
+<p>HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. 
HCatalog provides read and write interfaces for Pig and MapReduce and uses 
Hive's command line interface for issuing data definition and metadata 
exploration commands.</p>
 <p></p>
 <a name="Interfaces"></a>
 <h3 class="h4">Interfaces</h3>
-<p>The HCatalog interface for Pig &ndash; HCatLoader and HCatStorer &ndash; is 
an implementation of the Pig load and store interfaces. HCatLoader accepts a 
table to read data from; you can indicate which partitions to scan by 
immediately following the load statement with a partition filter statement. 
HCatStorer accepts a table to write to and a specification of partition keys to 
create a new partition. Currently HCatStorer only supports writing to one 
partition. HCatLoader and HCatStorer are implemented on top of HCatInputFormat 
and HCatOutputFormat respectively </p>
-<p>The HCatalog interface for MapReduce &ndash; HCatInputFormat and 
HCatOutputFormat &ndash; is an implementation of Hadoop InputFormat and 
OutputFormat. HCatInputFormat accepts a table to read data from and a selection 
predicate to indicate which partitions to scan. HCatOutputFormat accepts a 
table to write to and a specification of partition keys to create a new 
partition. Currently HCatOutputFormat only supports writing to one 
partition.</p>
-<p>
-<strong>Note:</strong> Currently there is no Hive-specific interface. Since 
HCatalog uses Hive's metastore, Hive can read data in HCatalog directly as long 
as a SerDe for that data already exists. In the future we plan to write a 
HCatalogSerDe so that users won't need storage-specific SerDes and so that Hive 
users can write data to HCatalog. Currently, this is supported - if a Hive user 
writes data in the RCFile format, it is possible to read the data through 
HCatalog. </p>
-<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog 
CLI supports most of the DDL portion of Hive's query language, allowing users 
to create, alter, drop tables, etc. The CLI also supports the data exploration 
part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, etc.</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which 
implement the Pig load and store interfaces respectively. HCatLoader accepts a 
table to read data from; you can indicate which partitions to scan by 
immediately following the load statement with a partition filter statement. 
HCatStorer accepts a table to write to and optionally a specification of 
partition keys to create a new partition. You can write to a single partition 
by specifying the partition key(s) and value(s) in the STORE clause; and you 
can write to multiple partitions if the partition key(s) are columns in the 
data being stored. HCatLoader is implemented on top of HCatInputFormat and 
HCatStorer is implemented on top of HCatOutputFormat (see <a 
href="loadstore.html">HCatalog Load and Store</a>).</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for 
MapReduce; they implement Hadoop's InputFormat and OutputFormat, respectively. 
HCatInputFormat accepts a table to read data from and optionally a selection 
predicate to indicate which partitions to scan. HCatOutputFormat accepts a 
table to write to and optionally a specification of partition keys to create a 
new partition. You can write to a single partition by specifying the partition 
key(s) and value(s) in the STORE clause; and you can write to multiple 
partitions if the partition key(s) are columns in the data being stored. (See 
<a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
+<p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's 
metastore, Hive can read data in HCatalog directly.</p>
+<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog 
CLI supports all Hive DDL that does not require MapReduce to execute, allowing 
users to create, alter, drop tables, etc. (Unsupported Hive DDL includes 
import/export, CREATE TABLE AS SELECT, ALTER TABLE options REBUILD and 
CONCATENATE, and ANALYZE TABLE ... COMPUTE STATISTICS.) The CLI also supports 
the data exploration part of the Hive command line, such as SHOW TABLES, 
DESCRIBE TABLE, etc. (see the <a href="cli.html">HCatalog Command Line 
Interface</a>).</p>
 <a name="Data+Model"></a>
 <h3 class="h4">Data Model</h3>
-<p>HCatalog presents a relational view of data in HDFS. Data is stored in 
tables and these tables can be placed in databases. Tables can also be hash 
partitioned on one or more keys; that is, for a given value of a key (or set of 
keys) there will be one partition that contains all rows with that value (or 
set of values). For example, if a table is partitioned on date and there are 
three days of data in the table, there will be three partitions in the table. 
New partitions can be added to a table, and partitions can be dropped from a 
table. Partitioned tables have no partitions at create time. Unpartitioned 
tables effectively have one default partition that must be created at table 
creation time. There is no guaranteed read consistency when a partition is 
dropped.</p>
-<p>Partitions contain records. Once a partition is created records cannot be 
added to it, removed from it, or updated in it. (In the future some ability to 
integrate changes to a partition will be added.) Partitions are 
multi-dimensional and not hierarchical. Records are divided into columns. 
Columns have a name and a datatype. HCatalog supports the same datatypes as 
Hive. </p>
+<p>HCatalog presents a relational view of data. Data is stored in tables and 
these tables can be placed in databases. Tables can also be hash partitioned on 
one or more keys; that is, for a given value of a key (or set of keys) there 
will be one partition that contains all rows with that value (or set of 
values). For example, if a table is partitioned on date and there are three 
days of data in the table, there will be three partitions in the table. New 
partitions can be added to a table, and partitions can be dropped from a table. 
Partitioned tables have no partitions at create time. Unpartitioned tables 
effectively have one default partition that must be created at table creation 
time. There is no guaranteed read consistency when a partition is dropped.</p>
+<p>Partitions contain records. Once a partition is created records cannot be 
added to it, removed from it, or updated in it. Partitions are 
multi-dimensional and not hierarchical. Records are divided into columns. 
Columns have a name and a datatype. HCatalog supports the same datatypes as 
Hive (see <a href="loadstore.html">HCatalog Load and Store</a>). </p>
 </div>
      
   
 <a name="Data+Flow+Example"></a>
 <h2 class="h3">Data Flow Example</h2>
 <div class="section">
-<p>This simple data flow example shows how HCatalog is used to move data from 
the grid into a database. 
-  From the database, the data can then be analyzed using Hive.</p>
+<p>This simple data flow example shows how HCatalog can help grid users share 
and access data.</p>
 <p>
 <strong>First</strong> Joe in data acquisition uses distcp to get data onto 
the grid.</p>
 <pre class="code">
 hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data
 
-hcat "alter table rawevents add partition 20100819 
hdfs://data/rawevents/20100819/data"
+hcat "alter table rawevents add partition (ds='20100819') location 
'hdfs://data/rawevents/20100819/data'"
 </pre>
 <p>
 <strong>Second</strong> Sally in data processing uses Pig to cleanse and 
prepare the data.</p>
-<p>Without HCatalog, Sally must be manually informed by Joe that data is 
available, or use Oozie and poll on HDFS.</p>
+<p>Without HCatalog, Sally must be manually informed by Joe when data is 
available, or poll on HDFS.</p>
 <pre class="code">
 A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, 
&hellip;);
 B = filter A by bot_finder(zeta) = 0;
 &hellip;
 store Z into 'data/processedevents/20100819/data';
 </pre>
-<p>With HCatalog, Oozie will be notified by HCatalog data is available and can 
then start the Pig job</p>
+<p>With HCatalog, HCatalog will send a JMS message that data is available. The 
Pig job can then be started.</p>
 <pre class="code">
 A = load 'rawevents' using HCatLoader;
 B = filter A by date = '20100819' and by bot_finder(zeta) = 0;
@@ -115,20 +111,20 @@ alter table processedevents add partitio
 select advertiser_id, count(clicks)
 from processedevents
 where date = '20100819' 
-group by adverstiser_id;
+group by advertiser_id;
 </pre>
 <p>With HCatalog, Robert does not need to modify the table structure.</p>
 <pre class="code">
 select advertiser_id, count(clicks)
 from processedevents
 where date = &lsquo;20100819&rsquo; 
-group by adverstiser_id;
+group by advertiser_id;
 </pre>
 </div>
   
 <div class="copyright">
         Copyright &copy;
-         2011 <a href="http://www.apache.org/licenses/";>The Apache Software 
Foundation</a>
+         2012 <a href="http://www.apache.org/licenses/";>The Apache Software 
Foundation</a>
 </div>
 </div>
 </body>

Modified: 
incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml 
(original)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml 
Sat Apr 28 00:47:30 2012
@@ -25,8 +25,8 @@
    <section>
       <title>HCatalog </title>
       
-       <p>HCatalog is a table and storage management layer for Hadoop that 
enables users with different data processing tools â Pig, MapReduce, and Hive 
â to more easily read and write data on the grid. HCatalogâs table 
abstraction presents users with a relational view of data in the Hadoop 
distributed file system (HDFS) and ensures that users need not worry about 
where or in what format their data is stored â RCFile format, text files, or 
sequence files. </p>
-<p>HCatalog supports reading and writing files in any format for which a SerDe 
can be written. By default, HCatalog supports RCFile, CSV, JSON, and sequence 
file formats. To use a custom format, you must provide the InputFormat, 
OutputFormat, and SerDe.</p>
+       <p>HCatalog is a table and storage management layer for Hadoop that 
enables users with different data processing tools â Pig, MapReduce, and Hive 
â to more easily read and write data on the grid. HCatalogâs table 
abstraction presents users with a relational view of data in the Hadoop 
distributed file system (HDFS) and ensures that users need not worry about 
where or in what format their data is stored â RCFile format, text files, or 
SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe 
can be written. By default, HCatalog supports RCFile, CSV, JSON, and 
SequenceFile formats. To use a custom format, you must provide the InputFormat, 
OutputFormat, and SerDe.</p>
 <p></p>
 <figure src="images/hcat-product.jpg" align="left" alt="HCatalog Product"/>
 
@@ -36,16 +36,15 @@
       
       <section>
       <title>HCatalog Architecture</title>
-      <p>HCatalog is built on top of the Hive metastore and incorporates 
components from the Hive DDL. HCatalog provides read and write interfaces for 
Pig and MapReduce and uses
-      Hive's command line interface for issuing data definition and metadata 
exploration commands.</p>
+      <p>HCatalog is built on top of the Hive metastore and incorporates 
Hive's DDL. HCatalog provides read and write interfaces for Pig and MapReduce 
and uses Hive's command line interface for issuing data definition and metadata 
exploration commands.</p>
 
 <p></p>
 
 <section>
 <title>Interfaces</title>   
-<p>The HCatalog interface for Pig â HCatLoader and HCatStorer â is an 
implementation of the Pig load and store interfaces. HCatLoader accepts a table 
to read data from; you can indicate which partitions to scan by immediately 
following the load statement with a partition filter statement. HCatStorer 
accepts a table to write to and optionally a specification of partition keys to 
create a new partition. You can write to a single partition by specifying the 
partition key(s) and value(s) in the STORE clause; and you can write to 
multiple partitions if the partition key(s) are columns in the data being 
stored. HCatLoader and HCatStorer are implemented on top of HCatInputFormat and 
HCatOutputFormat, respectively (see <a href="loadstore.html">HCatalog Load and 
Store</a>).</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which 
implement the Pig load and store interfaces respectively. HCatLoader accepts a 
table to read data from; you can indicate which partitions to scan by 
immediately following the load statement with a partition filter statement. 
HCatStorer accepts a table to write to and optionally a specification of 
partition keys to create a new partition. You can write to a single partition 
by specifying the partition key(s) and value(s) in the STORE clause; and you 
can write to multiple partitions if the partition key(s) are columns in the 
data being stored. HCatLoader is implemented on top of HCatInputFormat and 
HCatStorer is implemented on top of HCatOutputFormat (see <a 
href="loadstore.html">HCatalog Load and Store</a>).</p>
 
-<p>The HCatalog interface for MapReduce â HCatInputFormat and 
HCatOutputFormat â is an implementation of Hadoop InputFormat and 
OutputFormat. HCatInputFormat accepts a table to read data from and optionally 
a selection predicate to indicate which partitions to scan. HCatOutputFormat 
accepts a table to write to and optionally a specification of partition keys to 
create a new partition. You can write to a single partition by specifying the 
partition key(s) and value(s) in the STORE clause; and you can write to 
multiple partitions if the partition key(s) are columns in the data being 
stored. (See <a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for 
MapReduce; they implement Hadoop's InputFormat and OutputFormat, respectively. 
HCatInputFormat accepts a table to read data from and optionally a selection 
predicate to indicate which partitions to scan. HCatOutputFormat accepts a 
table to write to and optionally a specification of partition keys to create a 
new partition. You can write to a single partition by specifying the partition 
key(s) and value(s) in the STORE clause; and you can write to multiple 
partitions if the partition key(s) are columns in the data being stored. (See 
<a href="inputoutput.html">HCatalog Input and Output</a>.)</p>
 
 <p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's 
metastore, Hive can read data in HCatalog directly.</p>
 

Modified: 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
 (original)
+++ 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
 Sat Apr 28 00:47:30 2012
@@ -33,9 +33,9 @@ import org.apache.hcatalog.data.transfer
 public class DataTransferFactory {
 
        /**
-        * This should be called once from master node to obtain an instance of 
{@link HCatReader}
-        * @param re built using {@link ReadEntity.Builder}
-        * @param config Any configuration which master node wants to pass to 
HCatalog
+        * This should be called once from master node to obtain an instance of 
{@link HCatReader}.
+        * @param re ReadEntity built using {@link ReadEntity.Builder}
+        * @param config any configuration which master node wants to pass to 
HCatalog
         * @return {@link HCatReader}
         */
        public static HCatReader getHCatReader(final ReadEntity re, final 
Map<String,String> config) {
@@ -44,9 +44,9 @@ public class DataTransferFactory {
        }
 
        /**
-        * This should only be called once from every slave nodes to obtain an 
instance of {@link HCatReader}
-        * @param split obtained at master node.
-        * @param config obtained at master node.
+        * This should only be called once from every slave node to obtain an 
instance of {@link HCatReader}.
+        * @param split input split obtained at master node
+        * @param config configuration obtained at master node
         * @return {@link HCatReader}
         */
        public static HCatReader getHCatReader(final InputSplit split, final 
Configuration config) {
@@ -55,11 +55,11 @@ public class DataTransferFactory {
        }
 
        /**
-        * This should only be called once from every slave nodes to obtain an 
instance of {@link HCatReader}
-        * This should be called if external system has some state to provide 
to HCatalog
-        * @param split obtained at master node.
-        * @param config obtained at master node.
-        * @param sp 
+        * This should only be called once from every slave node to obtain an 
instance of {@link HCatReader}.
+        * This should be called if an external system has some state to 
provide to HCatalog.
+        * @param split input split obtained at master node
+        * @param config configuration obtained at master node
+        * @param sp {@link StateProvider}
         * @return {@link HCatReader}
         */
        public static HCatReader getHCatReader(final InputSplit split, final 
Configuration config, StateProvider sp) {
@@ -67,9 +67,9 @@ public class DataTransferFactory {
                return new HCatInputFormatReader(split, config, sp);
        }
        
-       /** This should be called at master node to obtain an instance of 
{@link HCatWriter}
-        * @param we built using {@link WriteEntity.Builder}
-        * @param config Any configuration which master wants to pass to 
HCatalog
+       /** This should be called at master node to obtain an instance of 
{@link HCatWriter}.
+        * @param we WriteEntity built using {@link WriteEntity.Builder}
+        * @param config any configuration which master wants to pass to 
HCatalog
         * @return {@link HCatWriter}
         */
        public static HCatWriter getHCatWriter(final WriteEntity we, final 
Map<String,String> config) {
@@ -77,8 +77,8 @@ public class DataTransferFactory {
                return new HCatOutputFormatWriter(we, config);
        }
 
-       /** This should be called at slave nodes to obtain an instance of 
{@link HCatWriter}
-        * @param cntxt {@link WriterContext} obtained at master node.
+       /** This should be called at slave nodes to obtain an instance of 
{@link HCatWriter}.
+        * @param cntxt {@link WriterContext} obtained at master node
         * @return {@link HCatWriter}
         */
        public static HCatWriter getHCatWriter(final WriterContext cntxt) {
@@ -86,10 +86,10 @@ public class DataTransferFactory {
                return getHCatWriter(cntxt, DefaultStateProvider.get());
        }
        
-       /** This should be called at slave nodes to obtain an instance of 
{@link HCatWriter}
-        * If external system has some mechanism for providing state to 
HCatalog, this constructor
+       /** This should be called at slave nodes to obtain an instance of 
{@link HCatWriter}.
+        *  If an external system has some mechanism for providing state to 
HCatalog, this constructor
         *  can be used.
-        * @param cntxt {@link WriterContext} obtained at master node.
+        * @param cntxt {@link WriterContext} obtained at master node
         * @param sp {@link StateProvider} 
         * @return {@link HCatWriter}
         */

Modified: 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
 (original)
+++ 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
 Sat Apr 28 00:47:30 2012
@@ -22,16 +22,16 @@ import java.io.IOException;
 
 import org.apache.hadoop.mapreduce.Job;
 
-/** The InputFormat to use to read data from HCat */
+/** The InputFormat to use to read data from HCatalog. */
 public class HCatInputFormat extends HCatBaseInputFormat {
 
   /**
-   * Set the input to use for the Job. This queries the metadata server with
-   * the specified partition predicates, gets the matching partitions, puts
-   * the information in the conf object. The inputInfo object is updated with
-   * information needed in the client context
+   * Set the input information to use for the job. This queries the metadata 
server 
+   * with the specified partition predicates, gets the matching partitions, 
and 
+   * puts the information in the conf object. The inputInfo object is updated 
+   * with information needed in the client context.
    * @param job the job object
-   * @param inputJobInfo the input info for table to read
+   * @param inputJobInfo the input information about the table to read
    * @throws IOException the exception in communicating with the metadata 
server
    */
   public static void setInput(Job job,

Modified: 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
 (original)
+++ 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
 Sat Apr 28 00:47:30 2012
@@ -51,8 +51,8 @@ import org.apache.hcatalog.common.HCatUt
 import org.apache.hcatalog.data.HCatRecord;
 import org.apache.hcatalog.data.schema.HCatSchema;
 
-/** The OutputFormat to use to write data to HCat. The key value is ignored and
- * and should be given as null. The value is the HCatRecord to write.*/
+/** The OutputFormat to use to write data to HCatalog. The key value is 
ignored and
+ *  should be given as null. The value is the HCatRecord to write.*/
 public class HCatOutputFormat extends HCatBaseOutputFormat {
 
     static final private Log LOG = LogFactory.getLog(HCatOutputFormat.class);
@@ -61,10 +61,11 @@ public class HCatOutputFormat extends HC
     private static boolean harRequested;
 
     /**
-     * Set the info about the output to write for the Job. This queries the 
metadata server
-     * to find the StorageHandler to use for the table.  Throws error if 
partition is already published.
+     * Set the information about the output to write for the job. This queries 
the metadata server
+     * to find the StorageHandler to use for the table.  It throws an error if 
the 
+     * partition is already published.
      * @param job the job object
-     * @param outputJobInfo the table output info
+     * @param outputJobInfo the table output information for the job
      * @throws IOException the exception in communicating with the metadata 
server
      */
     @SuppressWarnings("unchecked")
@@ -204,6 +205,7 @@ public class HCatOutputFormat extends HC
      * table schema is used by default for the partition if this is not called.
      * @param job the job object
      * @param schema the schema for the data
+     * @throws IOException
      */
     public static void setSchema(final Job job, final HCatSchema schema) 
throws IOException {
 
@@ -214,11 +216,12 @@ public class HCatOutputFormat extends HC
     }
 
     /**
-     * Get the record writer for the job. Uses the StorageHandler's default 
OutputFormat
-     * to get the record writer.
-     * @param context the information about the current task.
-     * @return a RecordWriter to write the output for the job.
+     * Get the record writer for the job. This uses the StorageHandler's 
default 
+     * OutputFormat to get the record writer.
+     * @param context the information about the current task
+     * @return a RecordWriter to write the output for the job
      * @throws IOException
+     * @throws InterruptedException
      */
     @Override
     public RecordWriter<WritableComparable<?>, HCatRecord>

Modified: 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java 
(original)
+++ 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java 
Sat Apr 28 00:47:30 2012
@@ -27,7 +27,7 @@ import org.apache.hadoop.hive.ql.plan.Ta
 import org.apache.hcatalog.data.schema.HCatSchema;
 import org.apache.hcatalog.mapreduce.HCatStorageHandler;
 
-/** The Class used to serialize the partition information read from the 
metadata server that maps to a partition */
+/** The Class used to serialize the partition information read from the 
metadata server that maps to a partition. */
 public class PartInfo implements Serializable {
 
   /** The serialization version */
@@ -63,6 +63,8 @@ public class PartInfo implements Seriali
    * @param storageHandler the storage handler
    * @param location the location
    * @param hcatProperties hcat-specific properties at the partition
+   * @param jobProperties the job properties
+   * @param tableInfo the table information
    */
   public PartInfo(HCatSchema partitionSchema, HCatStorageHandler 
storageHandler,
                   String location, Properties hcatProperties, 
@@ -116,8 +118,8 @@ public class PartInfo implements Seriali
   }
 
   /**
-   * Gets the value of hcatProperties.
-   * @return the hcatProperties
+   * Gets the input storage handler properties.
+   * @return HCat-specific properties set at the partition 
    */
   public Properties getInputStorageHandlerProperties() {
     return hcatProperties;
@@ -147,10 +149,18 @@ public class PartInfo implements Seriali
     return partitionValues;
   }
 
+  /**
+   * Gets the job properties.
+   * @return a map of the job properties
+   */
   public Map<String,String> getJobProperties() {
     return jobProperties;
   }
 
+  /**
+   * Gets the HCatalog table information.
+   * @return the table information
+   */
   public HCatTableInfo getTableInfo() {
     return tableInfo;
   }

Modified: 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
URL: 
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java 
(original)
+++ 
incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java 
Sat Apr 28 00:47:30 2012
@@ -19,7 +19,7 @@ package org.apache.hcatalog.mapreduce;
 import java.io.Serializable;
 import java.util.Properties;
 
-/** Info about the storer to use for writing the data */
+/** Information about the storer to use for writing the data. */
 public class StorerInfo implements Serializable {
 
     /** The serialization version */
@@ -37,12 +37,12 @@ public class StorerInfo implements Seria
     private String storageHandlerClass;
 
     /**
-     * Initialize the storer info
-     * @param ifClass
-     * @param ofClass
-     * @param serdeClass
-     * @param storageHandlerClass
-     * @param properties
+     * Initialize the storer information.
+     * @param ifClass the input format class
+     * @param ofClass the output format class
+     * @param serdeClass the SerDe class
+     * @param storageHandlerClass the storage handler class
+     * @param properties the properties for the storage handler
      */
     public StorerInfo(String ifClass, String ofClass, String serdeClass, 
String storageHandlerClass, Properties properties) {
       super();
@@ -53,35 +53,50 @@ public class StorerInfo implements Seria
       this.properties = properties;
     }
 
+    /**
+     * @return the input format class
+     */
     public String getIfClass() {
         return ifClass;
     }
 
+    /**
+     * @param ifClass the input format class
+     */
     public void setIfClass(String ifClass) {
         this.ifClass = ifClass;
     }
 
+    /**
+     * @return the output format class
+     */
     public String getOfClass() {
         return ofClass;
     }
 
+    /**
+     * @return the serdeClass
+     */
     public String getSerdeClass() {
         return serdeClass;
     }
 
+    /**
+     * @return the storageHandlerClass
+     */
     public String getStorageHandlerClass() {
         return storageHandlerClass;
     }
 
     /**
-     * @return the properties
+     * @return the storer properties
      */
     public Properties getProperties() {
       return properties;
     }
 
     /**
-     * @param properties the properties to set
+     * @param properties the storer properties to set 
      */
     public void setProperties(Properties properties) {
       this.properties = properties;

svn commit: r1331643 - in /incubator/hcatalog/trunk: ./ src/docs/ src/docs/src/documentation/content/xdocs/ src/java/org/apache/hcatalog/data/transfer/ src/java/org/apache/hcatalog/mapreduce/

Reply via email to