Repository: accumulo-website Updated Branches: refs/heads/asf-site 3f99b6cc9 -> 1e7c974f1 refs/heads/master e0da132ec -> b52d466fd
Added more links to javadocs in documentation Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/b52d466f Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/b52d466f Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/b52d466f Branch: refs/heads/master Commit: b52d466fd22832d16773b087ee12c6ddbba1e3b7 Parents: e0da132 Author: Mike Walch <[email protected]> Authored: Tue May 30 11:36:05 2017 -0400 Committer: Mike Walch <[email protected]> Committed: Tue May 30 11:36:24 2017 -0400 ---------------------------------------------------------------------- _docs-unreleased/development/iterators.md | 22 +++++++++++++--------- _docs-unreleased/development/mapreduce.md | 18 ++++++++++-------- _docs-unreleased/development/proxy.md | 5 +++-- 3 files changed, 26 insertions(+), 19 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/b52d466f/_docs-unreleased/development/iterators.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/development/iterators.md b/_docs-unreleased/development/iterators.md index 2e1b242..ff9d56d 100644 --- a/_docs-unreleased/development/iterators.md +++ b/_docs-unreleased/development/iterators.md @@ -90,7 +90,7 @@ The arguments passed to seek are as follows: The TabletServer first provides a [Range], an object which defines some collection of Accumulo `Key`s, which defines the Key-Value pairs that this Iterator should return. Each [Range] has a `startKey` and `endKey` with an inclusive flag for -both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not +both. While this Range is often similar to the Range(s) set by the client on a [Scanner] or [BatchScanner], it is not guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s) provided to `seek` and Iterators should not return any Keys that fall outside of the provided Range. @@ -107,8 +107,8 @@ Range. For example, a regular expression Iterator would consume all records whic pattern before returning from `seek`. It is important to retain the original [Range] passed to this method to know when this Iterator should stop -reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it -will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet. +reading more Key-Value pairs. Ignoring this typically does not affect scans from a [Scanner], but it +will result in duplicate keys emitting from a [BatchScanner] if the scanned table has more than one tablet. Best practice is to never emit entries outside the seek range. ### next @@ -322,13 +322,13 @@ combining iterator. ## Best practices -Because of the flexibility that the [SortedKeyValueInterface] provides, it doesn't directly disallow +Because of the flexibility that the [SortedKeyValueIterator] interface provides, it doesn't directly disallow many implementations which are poor design decisions. The following are some common recommendations to follow and pitfalls to avoid in Iterator implementations. #### Avoid special logic encoded in Ranges -Commonly, granular Ranges that a client passes to an Iterator from a `Scanner` or `BatchScanner` are unmodified. +Commonly, granular Ranges that a client passes to an Iterator from a [Scanner] or [BatchScanner] are unmodified. If a [Range] falls within the boundaries of a Tablet, an Iterator will often see that same Range in the `seek` method. However, there is no guarantee that the [Range] will remain unaltered from client to server. As such, Iterators should *never* make assumptions about the current state/context based on the Range. @@ -364,17 +364,17 @@ Iterator's implementation of seek. ### Take caution in constructing new data in an Iterator -Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means +Implementations of Iterator might be tempted to open [BatchWriters][BatchWriter] inside of an Iterator as a means to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator is *not* managed in such a way that guarantees that this is safe nor efficient. Specifically, there is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s) -are reaped) without calling the close() method. `close`'ing and recreating a `BatchWriter` after every +are reaped) without calling the close() method. `close`'ing and recreating a [BatchWriter] after every Key-Value pair is also prohibitively performance limiting to be considered an option. The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair. -For example, the `WholeRowIterator` serializes the all of the Key-Values pairs that fall within each +For example, the [WholeRowIterator] serializes the all of the Key-Values pairs that fall within each row. A safe way to generate more data in an Iterator would be to construct an Iterator that is -"higher" (at a larger priority) than the `WholeRowIterator`, that is, the Iterator receives the Key-Value pairs which are +"higher" (at a larger priority) than the WholeRowIterator, that is, the Iterator receives the Key-Value pairs which are a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute some function, and add a new Key-Value pair to the original collection, re-serializing the collection of Key-Value pairs back into a single Key-Value pair. @@ -422,3 +422,7 @@ identify common pitfalls in user-created Iterators. [Combiner]: {{ page.javadoc_core }}/org/apache/accumulo/core/iterators/Combiner.html [Range]: {{ page.javadoc_core }}/org/apache/accumulo/core/data/Range.html [iterator-test-harness]: {{ page.docs_baseurl }}/development/development_tools#iterator-test-harness +[BatchScanner]: {{ page.javadoc_core}}/org/apache/accumulo/core/client/BatchScanner.html +[Scanner]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/Scanner.html +[BatchWriter]: {{ page.javdoc_core }}/org/apache/accumulo/core/client/BatchWriter.html +[WholeRowIterator]: {{ page.javdoc_core }}/org/apache/accumulo/core/iterators/user/WholeRowIterator.html http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/b52d466f/_docs-unreleased/development/mapreduce.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/development/mapreduce.md b/_docs-unreleased/development/mapreduce.md index 98b2682..b3465ad 100644 --- a/_docs-unreleased/development/mapreduce.md +++ b/_docs-unreleased/development/mapreduce.md @@ -5,10 +5,9 @@ order: 2 --- Accumulo tables can be used as the source and destination of MapReduce jobs. To -use an Accumulo table with a MapReduce job (specifically with the new Hadoop API -as of version 0.20), configure the job parameters to use the AccumuloInputFormat -and AccumuloOutputFormat. Accumulo specific parameters can be set via these -two format classes to do the following: +use an Accumulo table with a MapReduce job, configure the job parameters to use +the [AccumuloInputFormat] and [AccumuloOutputFormat]. Accumulo specific parameters +can be set via these two format classes to do the following: * Authenticate and provide user credentials for the input * Restrict the scan to a range of rows @@ -17,7 +16,7 @@ two format classes to do the following: ## Mapper and Reducer classes To read from an Accumulo table create a Mapper with the following class -parameterization and be sure to configure the AccumuloInputFormat. +parameterization and be sure to configure the [AccumuloInputFormat]. ```java class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> { @@ -28,7 +27,7 @@ class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> { ``` To write to an Accumulo table, create a Reducer with the following class -parameterization and be sure to configure the AccumuloOutputFormat. The key +parameterization and be sure to configure the [AccumuloOutputFormat]. The key emitted from the Reducer identifies the table to which the mutation is sent. This allows a single Reducer to write to more than one table if desired. A default table can be configured using the AccumuloOutputFormat, in which case the output table @@ -46,7 +45,7 @@ class MyReducer extends Reducer<WritableComparable, Writable, Text, Mutation> { The Text object passed as the output should contain the name of the table to which this mutation should be applied. The Text can be null in which case the mutation -will be applied to the default table name specified in the AccumuloOutputFormat +will be applied to the default table name specified in the [AccumuloOutputFormat] options. ## AccumuloInputFormat options @@ -91,7 +90,7 @@ AccumuloInputFormat.addIterator(job, is); ## AccumuloMultiTableInputFormat options -The AccumuloMultiTableInputFormat allows the scanning over multiple tables +The [AccumuloMultiTableInputFormat] allows the scanning over multiple tables in a single MapReduce job. Separate ranges, columns, and iterators can be used for each table. @@ -179,3 +178,6 @@ AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes The [MapReduce example][mapred-example] contains a complete example of using MapReduce with Accumulo. [mapred-example]: https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md +[AccumuloInputFormat]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html +[AccumuloMultiTableInputFormat]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/mapred/AccumuloMultiTableInputFormat.html +[AccumuloOutputFormat]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/mapred/AccumuloOutputFormat.html http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/b52d466f/_docs-unreleased/development/proxy.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/development/proxy.md b/_docs-unreleased/development/proxy.md index f3e8f3a..c0f71dd 100644 --- a/_docs-unreleased/development/proxy.md +++ b/_docs-unreleased/development/proxy.md @@ -6,7 +6,7 @@ order: 3 The proxy API allows the interaction with Accumulo with languages other than Java. A proxy server is provided in the codebase and a client can further be generated. -The proxy API can also be used instead of the traditional ZooKeeperInstance class to +The proxy API can also be used instead of the traditional [ZooKeeperInstance] class to provide a single TCP port in which clients can be securely routed through a firewall, without requiring access to all tablet servers in the cluster. @@ -30,7 +30,7 @@ the very least, you need to supply the following properties: You can find a sample configuration file in your distribution at `proxy/proxy.properties`. This sample configuration file further demonstrates an ability to back the proxy server -by MockAccumulo or the MiniAccumuloCluster. +by MiniAccumuloCluster. ## Running the Proxy Server @@ -117,3 +117,4 @@ for(KeyValue keyValue : results.getResultsIterator()) { client.closeScanner(scanner); ``` +[ZookeeperInstance]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/ZooKeeperInstance.html
