http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9ebc5f9a/docs/unreleased/development/iterator_design.html ---------------------------------------------------------------------- diff --git a/docs/unreleased/development/iterator_design.html b/docs/unreleased/development/iterator_design.html deleted file mode 100644 index 2dfb4c7..0000000 --- a/docs/unreleased/development/iterator_design.html +++ /dev/null @@ -1,722 +0,0 @@ -<!DOCTYPE html> -<html lang="en"> -<head> -<!-- - Licensed to the Apache Software Foundation (ASF) under one or more - contributor license agreements. See the NOTICE file distributed with - this work for additional information regarding copyright ownership. - The ASF licenses this file to You under the Apache License, Version 2.0 - (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> -<meta charset="utf-8"> -<meta http-equiv="X-UA-Compatible" content="IE=edge"> -<meta name="viewport" content="width=device-width, initial-scale=1"> -<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> -<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> -<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> -<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> - -<title>Accumulo Documentation - Iterator Design</title> - -<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> -<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> -<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> -<script> - // show location of canonical site if not currently on the canonical site - $(function() { - var host = window.location.host; - if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { - $('#non-canonical').show(); - } - }); - - $(function() { - // decorate section headers with anchors - return $("h2, h3, h4, h5, h6").each(function(i, el) { - var $el, icon, id; - $el = $(el); - id = $el.attr('id'); - icon = '<i class="fa fa-link"></i>'; - if (id) { - return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); - } - }); - }); - - // configure Google Analytics - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - - if (ga.hasOwnProperty('loaded') && ga.loaded === true) { - ga('create', 'UA-50934829-1', 'apache.org'); - ga('send', 'pageview'); - } -</script> - -</head> -<body style="padding-top: 100px"> - - <nav class="navbar navbar-default navbar-fixed-top"> - <div class="container"> - <div class="navbar-header"> - <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> - <span class="sr-only">Toggle navigation</span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - </button> - <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" - /></a> - </div> - <div class="collapse navbar-collapse" id="navbar-items"> - <ul class="nav navbar-nav"> - <li class="nav-link"><a href="/downloads">Download</a></li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/release/accumulo-1.8.1/">1.8.1 (Latest)</a></li> - <li><a href="/release/accumulo-1.7.3/">1.7.3</a></li> - <li><a href="/release/accumulo-1.6.6/">1.6.6</a></li> - <li><a href="/release/">Archive</a></li> - </ul> - </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/1.8/accumulo_user_manual.html">User Manual (1.8)</a></li> - <li><a href="/1.8/apidocs">Javadocs (1.8)</a></li> - <li><a href="/1.8/examples">Examples (1.8)</a></li> - <li><a href="/features">Features</a></li> - <li><a href="/glossary">Glossary</a></li> - <li><a href="/external-docs">External Docs</a></li> - <li><a href="/docs-archive/">Archive</a></li> - </ul> - </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/get_involved">Get Involved</a></li> - <li><a href="/mailing_list">Mailing Lists</a></li> - <li><a href="/people">People</a></li> - <li><a href="/related-projects">Related Projects</a></li> - <li><a href="/contributor/">Contributor Guide</a></li> - </ul> - </li> - </ul> - <ul class="nav navbar-nav navbar-right"> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Apache Software Foundation<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/licenses/LICENSE-2.0">License <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> - </ul> - </li> - </ul> - </div> - </div> -</nav> - - <div class="container"> - <div class="row"> - <div class="col-md-12"> - - <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> - Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> - </div> - <div id="content"> - - <div class="alert alert-danger" role="alert">This documentation is for an unreleased version of Apache Accumulo that is currently under development! Check out the <a href="/1.8/accumulo_user_manual.html">documentation for the latest release</a>.</div> - -<div class="row"> - <div class="col-md-3"> - <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true"> - <div class="panel panel-default"> - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsegetting-started" aria-expanded="false" aria-controls="collapsegetting-started"> - Getting started - </a> - </h4> - </div> - <div id="collapsegetting-started" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/design">Accumulo Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/clients">Accumulo Clients</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/shell">Accumulo Shell</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_design">Table Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_configuration">Table Configuration</a></div> - - </div> - </div> - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsedevelopment" aria-expanded="true" aria-controls="collapsedevelopment"> - Development - </a> - </h4> - </div> - <div id="collapsedevelopment" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterator_design">Iterator Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterator_testing">Iterator Testing</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/development_tools">Development Tools</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/sampling">Sampling</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/summaries">Summary Statistics</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/security">Security</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/high_speed_ingest">High-Speed Ingest</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/analytics">Analytics</a></div> - - </div> - </div> - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseadministration" aria-expanded="false" aria-controls="collapseadministration"> - Administration - </a> - </h4> - </div> - <div id="collapseadministration" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/installation">Installation</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-management">Configuration Management</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-properties">Configuration Properties</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/monitoring-metrics">Monitoring & Metrics</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/tracing">Tracing</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/fate">FATE</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/multivolume">Multi-Volume Installations</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/ssl">SSL</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/kerberos">Kerberos</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/replication">Replication</a></div> - - </div> - </div> - - - - - - - - - - - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsetroubleshooting" aria-expanded="false" aria-controls="collapsetroubleshooting"> - Troubleshooting - </a> - </h4> - </div> - <div id="collapsetroubleshooting" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/basic">Basic Troubleshooting</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/advanced">Advanced Troubleshooting</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/tools">Troubleshooting Tools</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/system-metadata-tables">System Metadata Tables</a></div> - - </div> - </div> - - - - </div> - </div> - </div> - <div class="col-md-9"> - - <p><a href="/docs/unreleased/">Accumulo unreleased docs</a> >> Development >> Iterator Design</p> - - - <h1>Iterator Design</h1> - - <p>Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, are server-side programming constructs -that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers. The name rightly -brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java -Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element -in the iteration, Accumulo Iterators must also support the ability to âmoveâ (<code class="highlighter-rouge">seek</code>) to an specified point in the -iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a -series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create -multiple âpointersâ over the same underlying data (which is extremely powerful since each stream is sorted) or they can -merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to -a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators -are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.</p> - -<p>Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is -buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong -understanding of this as the interface provides clear definitions about what each action each method should take. This -chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common -pitfalls.</p> - -<h2 id="instantiation">Instantiation</h2> - -<p>To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every -TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in -<code class="highlighter-rouge">lib/</code>. In development environments, it is convenient to instead place the JAR file in <code class="highlighter-rouge">lib/ext/</code> as JAR files -in this directory are dynamically reloaded by the TabletServers alleviating the need to restart Accumulo while -testing an Iterator. Advanced classloader features which enable other types of filesystems and per-table classpath -configurations (as opposed to process-wide classpaths). These features are not covered here, but elsewhere in the user -manual.</p> - -<p>Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that -Iterators must have a public no-args constructor.</p> - -<h2 id="interface">Interface</h2> - -<p>A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:</p> - -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">SortedKeyValueIterator</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="n">source</span><span class="o">,</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">></span> <span class="n">options</span><span class="o">,</span> <span class="n">IteratorEnvironment</span> <span class="n">env</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span><span class="o">;</span> - -<span class="kt">boolean</span> <span class="nf">hasTop</span><span class="o">();</span> - -<span class="kt">void</span> <span class="nf">next</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span><span class="o">;</span> - -<span class="kt">void</span> <span class="nf">seek</span><span class="o">(</span><span class="n">Range</span> <span class="n">range</span><span class="o">,</span> <span class="n">Collection</span><span class="o"><</span><span class="n">ByteSequence</span><span class="o">></span> <span class="n">columnFamilies</span><span class="o">,</span> <span class="kt">boolean</span> <span class="n">inclusive</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span><span class="o">;</span> - -<span class="n">Key</span> <span class="nf">getTopKey</span><span class="o">();</span> - -<span class="n">Value</span> <span class="nf">getTopValue</span><span class="o">();</span> - -<span class="n">SortedKeyValueIterator</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="nf">deepCopy</span><span class="o">(</span><span class="n">IteratorEnvironment</span> <span class="n">env</span><span class="o">);</span> -</code></pre> -</div> - -<h3 id="init">init</h3> - -<p>The <code class="highlighter-rouge">init</code> method is called by the TabletServer after it constructs an instance of the Iterator. This method should -clear/reset any internal state in the Iterator and prepare it to process data. The first argument, the <code class="highlighter-rouge">source</code>, is the -Iterator âbelowâ this Iterator (where the client is at âtopâ and the Iterator for files in HDFS are at the âbottomâ). -The âsourceâ Iterator provides the Key-Value pairs which this Iterator will operate upon.</p> - -<p>The second argument, a Map of options, is made up of options provided by the user, options set in the tableâs -configuration, and/or options set in the containing namespaceâs configuration. -These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context -(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter -Key-Value pairs in a regular expression Iterator.</p> - -<p>The third argument, the <code class="highlighter-rouge">IteratorEnvironment</code>, is a special object which provides information to this Iterator about the -context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator -knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan -(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to -optimize itself.</p> - -<h3 id="seek">seek</h3> - -<p>The <code class="highlighter-rouge">seek</code> method is likely the most confusing method on the Iterator interface. The purpose of this method is to -advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before -the implementation of this method returns some additional processing is performed which may further advance the current -position past the <code class="highlighter-rouge">startKey</code> of the <code class="highlighter-rouge">Range</code>. This, however, is dependent on the functionality the iterator provides. For -example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before <code class="highlighter-rouge">seek</code> -returns. The important condition for <code class="highlighter-rouge">seek</code> to meet is that this Iterator should be ready to return the first Key-Value -pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by <code class="highlighter-rouge">getTopKey</code> -and <code class="highlighter-rouge">getTopValue</code>, respectively, and <code class="highlighter-rouge">hasTop</code> should return a boolean denoting whether or not there is -a Key-Value pair to return.</p> - -<p>The arguments passed to seek are as follows:</p> - -<p>The TabletServer first provides a <code class="highlighter-rouge">Range</code>, an object which defines some collection of Accumulo <code class="highlighter-rouge">Key</code>s, which defines the -Key-Value pairs that this Iterator should return. Each <code class="highlighter-rouge">Range</code> has a <code class="highlighter-rouge">startKey</code> and <code class="highlighter-rouge">endKey</code> with an inclusive flag for -both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not -guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on -Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s) -provided to <code class="highlighter-rouge">seek</code> and Iterators should not return any Keys that fall outside of the provided Range.</p> - -<p>The second argument, a <code class="highlighter-rouge">Collection<ByteSequence></code>, is the set of column families which should be retained or -excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families -should be treated as an inclusion collection (true) or an exclusion collection (false).</p> - -<p>It is likely that all implementations of <code class="highlighter-rouge">seek</code> will first make a call to the <code class="highlighter-rouge">seek</code> method on the -âsourceâ Iterator that was provided in the <code class="highlighter-rouge">init</code> method. The collection of column families and -the boolean <code class="highlighter-rouge">include</code> argument should be passed down as well as the <code class="highlighter-rouge">Range</code>. Somewhat commonly, the Iterator will -also implement some sort of additional logic to find or compute the first Key-Value pair in the provided -Range. For example, a regular expression Iterator would consume all records which do not match the given -pattern before returning from <code class="highlighter-rouge">seek</code>.</p> - -<p>It is important to retain the original Range passed to this method to know when this Iterator should stop -reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it -will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet. -Best practice is to never emit entries outside the seek range.</p> - -<h3 id="next">next</h3> - -<p>The <code class="highlighter-rouge">next</code> method is analogous to the <code class="highlighter-rouge">next</code> method on a Java Iterator: this method should advance -the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex -logic, this may result in more than one Key-Value pair being inspected. This method alters -some internal state that is exposed via the <code class="highlighter-rouge">hasTop</code>, <code class="highlighter-rouge">getTopKey</code>, and <code class="highlighter-rouge">getTopValue</code> methods.</p> - -<p>The result of this method is commonly caching a Key-Value pair which <code class="highlighter-rouge">getTopKey</code> and <code class="highlighter-rouge">getTopValue</code> -can later return. While there is another Key-Value pair to return, <code class="highlighter-rouge">hasTop</code> should return true. -If there are no more Key-Value pairs to return from this Iterator since the last call to -<code class="highlighter-rouge">seek</code>, <code class="highlighter-rouge">hasTop</code> should return false.</p> - -<h3 id="hastop">hasTop</h3> - -<p>The <code class="highlighter-rouge">hasTop</code> method is similar to the <code class="highlighter-rouge">hasNext</code> method on a Java Iterator in that it informs -the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method -should return false. Like a Java Iterator, multiple calls to <code class="highlighter-rouge">hasTop</code> (without calling <code class="highlighter-rouge">next</code>) should not -alter the internal state of the Iterator.</p> - -<h3 id="gettopkey-and-gettopvalue">getTopKey and getTopValue</h3> - -<p>These methods simply return the current Key-Value pair for this iterator. If <code class="highlighter-rouge">hasTop</code> returns true, -both of these methods should return non-null objects. If <code class="highlighter-rouge">hasTop</code> returns false, it is undefined -what these methods should return. Like <code class="highlighter-rouge">hasTop</code>, multiple calls to these methods should not alter -the state of the Iterator.</p> - -<p>Users should take caution when either</p> - -<ol> - <li>caching the Key/Value from <code class="highlighter-rouge">getTopKey</code>/<code class="highlighter-rouge">getTopValue</code>, for use after calling <code class="highlighter-rouge">next</code> on the source iterator. -In this case, the cached Key/Value object is aliased to the reference returned by the source iterator. -Iterators may reuse the same Key/Value object in a <code class="highlighter-rouge">next</code> call for performance reasons, changing the data -that the cached Key/Value object references and resulting in a logic bug.</li> - <li>modifying the Key/Value from <code class="highlighter-rouge">getTopKey</code>/<code class="highlighter-rouge">getTopValue</code>. If the source iterator reuses data stored in the Key/Value, -then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.</li> -</ol> - -<p>In both cases, copying the Key/Valueâs data into a new object ensures iterator correctness. If neither case applies, -it is safe to not copy the Key/Value. The general guideline is to be aware of who else may use Key/Value objects -returned from <code class="highlighter-rouge">getTopKey</code>/<code class="highlighter-rouge">getTopValue</code>.</p> - -<h3 id="deepcopy">deepCopy</h3> - -<p>The <code class="highlighter-rouge">deepCopy</code> method is similar to the <code class="highlighter-rouge">clone</code> method from the Java <code class="highlighter-rouge">Cloneable</code> interface. -Implementations of this method should return a new object of the same type as the Accumulo Iterator -instance it was called on. Any internal state from the instance <code class="highlighter-rouge">deepCopy</code> was called -on should be carried over to the returned copy. The returned copy should be ready to have -<code class="highlighter-rouge">seek</code> called on it. The SortedKeyValueIterator interface guarantees that <code class="highlighter-rouge">init</code> will be called on -an iterator before <code class="highlighter-rouge">deepCopy</code> and that <code class="highlighter-rouge">init</code> will not be called on the iterator returned by -<code class="highlighter-rouge">deepCopy</code>.</p> - -<p>Typically, implementations of <code class="highlighter-rouge">deepCopy</code> call a copy-constructor which will initialize -internal data structures. As with <code class="highlighter-rouge">seek</code>, it is common for the <code class="highlighter-rouge">IteratorEnvironment</code> -argument to be ignored as most Iterator implementations can be written without the explicit -information the environment provides.</p> - -<p>In the analogy of a series of Iterators representing a tree, <code class="highlighter-rouge">deepCopy</code> can be thought of as -early programming assignments which implement their own tree data structures. <code class="highlighter-rouge">deepCopy</code> calls -copy on its sources (the children), copies itself, attaches the copies of the children, and -then returns itself.</p> - -<h2 id="tabletserver-invocation-of-iterators">TabletServer invocation of Iterators</h2> - -<p>The following code is a general outline for how TabletServers invoke Iterators.</p> - -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">List</span><span class="o"><</span><span class="n">KeyValue</span><span class="o">></span> <span class="n">batch</span><span class="o">;</span> -<span class="n">Range</span> <span class="n">range</span> <span class="o">=</span> <span class="n">getRangeFromClient</span><span class="o">();</span> -<span class="k">while</span><span class="o">(!</span><span class="n">overSizeLimit</span><span class="o">(</span><span class="n">batch</span><span class="o">)){</span> - <span class="n">SortedKeyValueIterator</span> <span class="n">source</span> <span class="o">=</span> <span class="n">getSystemIterator</span><span class="o">();</span> - - <span class="k">for</span><span class="o">(</span><span class="n">String</span> <span class="n">clzName</span> <span class="o">:</span> <span class="n">getUserIterators</span><span class="o">()){</span> - <span class="n">Class</span><span class="o"><?></span> <span class="n">clz</span> <span class="o">=</span> <span class="n">Class</span><span class="o">.</span><span class="na">forName</span><span class="o">(</span><span class="n">clzName</span><span class="o">);</span> - <span class="n">SortedKeyValueIterator</span> <span class="n">iter</span> <span class="o">=</span> <span class="o">(</span><span class="n">SortedKeyValueIterator</span><span class="o">)</span> <span class="n">clz</span><span class="o">.</span><span class="na">newInstance</span><span class="o">();</span> - <span class="n">iter</span><span class="o">.</span><span class="na">init</span><span class="o">(</span><span class="n">source</span><span class="o">,</span> <span class="n">opts</span><span class="o">,</span> <span class="n">env</span><span class="o">);</span> - <span class="n">source</span> <span class="o">=</span> <span class="n">iter</span><span class="o">;</span> - <span class="o">}</span> - - <span class="c1">// read a batch of data to return to client</span> - <span class="c1">// the last iterator, the "top"</span> - <span class="n">SortedKeyValueIterator</span> <span class="n">topIter</span> <span class="o">=</span> <span class="n">source</span><span class="o">;</span> - <span class="n">topIter</span><span class="o">.</span><span class="na">seek</span><span class="o">(</span><span class="n">getRangeFromUser</span><span class="o">(),</span> <span class="o">...)</span> - - <span class="k">while</span><span class="o">(</span><span class="n">topIter</span><span class="o">.</span><span class="na">hasTop</span><span class="o">()</span> <span class="o">&&</span> <span class="o">!</span><span class="n">overSizeLimit</span><span class="o">(</span><span class="n">batch</span><span class="o">)){</span> - <span class="n">key</span> <span class="o">=</span> <span class="n">topIter</span><span class="o">.</span><span class="na">getTopKey</span><span class="o">()</span> - <span class="n">val</span> <span class="o">=</span> <span class="n">topIter</span><span class="o">.</span><span class="na">getTopValue</span><span class="o">()</span> - <span class="n">batch</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">KeyValue</span><span class="o">(</span><span class="n">key</span><span class="o">,</span> <span class="n">val</span><span class="o">)</span> - <span class="k">if</span><span class="o">(</span><span class="n">systemDataSourcesChanged</span><span class="o">()){</span> - <span class="c1">// code does not show isolation case, which will</span> - <span class="c1">// keep using same data sources until a row boundry is hit </span> - <span class="n">range</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Range</span><span class="o">(</span><span class="n">key</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">range</span><span class="o">.</span><span class="na">endKey</span><span class="o">(),</span> <span class="n">range</span><span class="o">.</span><span class="na">endKeyInclusive</span><span class="o">());</span> - <span class="k">break</span><span class="o">;</span> - <span class="o">}</span> - <span class="o">}</span> -<span class="o">}</span> -<span class="c1">//return batch of key values to client</span> -</code></pre> -</div> - -<p>Additionally, the obtuse âre-seekâ case can be outlined as the following:</p> - -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Given the above</span> -<span class="n">List</span><span class="o"><</span><span class="n">KeyValue</span><span class="o">></span> <span class="n">batch</span> <span class="o">=</span> <span class="n">getNextBatch</span><span class="o">();</span> - -<span class="c1">// Store off lastKeyReturned for this client</span> -<span class="n">lastKeyReturned</span> <span class="o">=</span> <span class="n">batch</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">batch</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">-</span> <span class="mi">1</span><span class="o">).</span><span class="na">getKey</span><span class="o">();</span> - -<span class="c1">// thread goes away (client stops asking for the next batch).</span> - -<span class="c1">// Eventually client comes back</span> -<span class="c1">// Setup as before...</span> - -<span class="n">Range</span> <span class="n">userRange</span> <span class="o">=</span> <span class="n">getRangeFromUser</span><span class="o">();</span> -<span class="n">Range</span> <span class="n">actualRange</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Range</span><span class="o">(</span><span class="n">lastKeyReturned</span><span class="o">,</span> <span class="kc">false</span> - <span class="n">userRange</span><span class="o">.</span><span class="na">getEndKey</span><span class="o">(),</span> <span class="n">userRange</span><span class="o">.</span><span class="na">isEndKeyInclusive</span><span class="o">());</span> - -<span class="c1">// Use the actualRange, not the user provided one</span> -<span class="n">topIter</span><span class="o">.</span><span class="na">seek</span><span class="o">(</span><span class="n">actualRange</span><span class="o">);</span> -</code></pre> -</div> - -<h2 id="isolation">Isolation</h2> - -<p>Accumulo provides a feature which clients can enable to prevent the viewing of partially -applied mutations within the context of rows. If a client is submitting multiple column -updates to rows at a time, isolation would ensure that a client would either see all of -updates made to that row or none of the updates (until they are all applied).</p> - -<p>When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo -reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a -key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources -may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables -Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the -next possible row.</p> - -<h2 id="abstract-iterators">Abstract Iterators</h2> - -<p>A number of Abstract implementations of Iterators are provided to allow for faster creation -of common patterns. The most commonly used abstract implementations are the <code class="highlighter-rouge">Filter</code> and -<code class="highlighter-rouge">Combiner</code> classes. When possible these classes should be used instead as they have been -thoroughly tested inside Accumulo itself.</p> - -<h3 id="filter">Filter</h3> - -<p>The <code class="highlighter-rouge">Filter</code> abstract Iterator provides a very simple implementation which allows implementations -to define whether or not a Key-Value pair should be returned via an <code class="highlighter-rouge">accept(Key, Value)</code> method.</p> - -<p>Filters are extremely simple to implement; however, when the implementation is filtering a -large percentage of Key-Value pairs with respect to the total number of pairs examined, -it can be very inefficient. For example, if a Filter implementation can determine after examining -part of the row that no other pairs in this row will be accepted, there is no mechanism to -efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of -1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined -that no other Key-Value pairs in this row will be accepted. The Filter must still examine each -remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that -Filters have no means to leverage the <code class="highlighter-rouge">seek</code> method to efficiently skip large portions -of Key-Value pairs.</p> - -<p>As such, the <code class="highlighter-rouge">Filter</code> class functions well for filtering small amounts of data, but is -inefficient for filtering large amounts of data. The decision to use a <code class="highlighter-rouge">Filter</code> strongly -depends on the use case and distribution of data being filtered.</p> - -<h3 id="combiner">Combiner</h3> - -<p>The <code class="highlighter-rouge">Combiner</code> class is another common abstract Iterator. Similar to the <code class="highlighter-rouge">Combiner</code> interface -define in Hadoopâs MapReduce framework, implementations of this abstract class reduce -multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair. -Combiners provide a simple way to implement common operations like summation and -aggregation without the need to implement the entire Accumulo Iterator interface.</p> - -<p>One important consideration when choosing to design a Combiner is that the âreductionâ operation -is often best represented when it is associative and commutative. Operations which do not meet -these criteria can be implemented; however, the implementation can be difficult.</p> - -<p>A second consideration is that a Combiner is not guaranteed to see every Key-Value pair -which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value -pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that -every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for -Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the -timestamps 1, 2, 4 and 5.</p> - -<p>Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the -Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The -Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value -pair that has the largest timestamp.</p> - -<h2 id="best-practices">Best practices</h2> - -<p>Because of the flexibility that the <code class="highlighter-rouge">SortedKeyValueInterface</code> provides, it doesnât directly disallow -many implementations which are poor design decisions. The following are some common recommendations to -follow and pitfalls to avoid in Iterator implementations.</p> - -<h4 id="avoid-special-logic-encoded-in-ranges">Avoid special logic encoded in Ranges</h4> - -<p>Commonly, granular Ranges that a client passes to an Iterator from a <code class="highlighter-rouge">Scanner</code> or <code class="highlighter-rouge">BatchScanner</code> are unmodified. -If a <code class="highlighter-rouge">Range</code> falls within the boundaries of a Tablet, an Iterator will often see that same Range in the -<code class="highlighter-rouge">seek</code> method. However, there is no guarantee that the <code class="highlighter-rouge">Range</code> will remain unaltered from client to server. As such, Iterators -should <em>never</em> make assumptions about the current state/context based on the <code class="highlighter-rouge">Range</code>.</p> - -<p>The common failure condition is referred to as a âre-seekâ. In the context of a Scan, TabletServers construct the -âstackâ of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value -pairs are collected, it is common for the Iterators to be âtorn downâ until the client asks for the next batch of -Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available -resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know -the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created -from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan, -non-inclusive.</p> - -<h3 id="seeking-backwards"><code class="highlighter-rouge">seek</code>âing backwards</h3> - -<p>The ability for an Iterator to âskip overâ large blocks of Key-Value pairs is a major tenet behind Iterators. -By <code class="highlighter-rouge">seek</code>âing when it is known that there is a collection of Key-Value pairs which can be ignored can -greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.</p> - -<p>While the <code class="highlighter-rouge">seek</code> method provides the <code class="highlighter-rouge">Range</code> that should be used to <code class="highlighter-rouge">seek</code> the underlying source Iterator, -there is no guarantee that the implementing Iterator uses that <code class="highlighter-rouge">Range</code> to perform the <code class="highlighter-rouge">seek</code> on its -âsourceâ Iterator. As such, it is possible to seek to any <code class="highlighter-rouge">Range</code> and the interface has no assertions -to prevent this from happening.</p> - -<p>Since Iterators are allowed to <code class="highlighter-rouge">seek</code> to arbitrary Keys, it also allows Iterators to create infinite loops -inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should -construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.</p> - -<p>Thus, <code class="highlighter-rouge">seek</code>âs should always be thought of as making âforward progressâ in the view of the total iteration. The -<code class="highlighter-rouge">startKey</code> of a <code class="highlighter-rouge">Range</code> should always be greater than the current Key seen by the Iterator while the <code class="highlighter-rouge">endKey</code> of the -<code class="highlighter-rouge">Range</code> should always retain the original <code class="highlighter-rouge">endKey</code> (and <code class="highlighter-rouge">endKey</code> inclusivity) of the last <code class="highlighter-rouge">Range</code> seen by your -Iteratorâs implementation of seek.</p> - -<h3 id="take-caution-in-constructing-new-data-in-an-iterator">Take caution in constructing new data in an Iterator</h3> - -<p>Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means -to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator -is <em>not</em> managed in such a way that guarantees that this is safe nor efficient. Specifically, there -is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s) -are reaped) without calling the close() method. <code class="highlighter-rouge">close</code>âing and recreating a <code class="highlighter-rouge">BatchWriter</code> after every -Key-Value pair is also prohibitively performance limiting to be considered an option.</p> - -<p>The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair. -For example, the <code class="highlighter-rouge">WholeRowIterator</code> serializes the all of the Key-Values pairs that fall within each -row. A safe way to generate more data in an Iterator would be to construct an Iterator that is -âhigherâ (at a larger priority) than the <code class="highlighter-rouge">WholeRowIterator</code>, that is, the Iterator receives the Key-Value pairs which are -a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute -some function, and add a new Key-Value pair to the original collection, re-serializing the collection -of Key-Value pairs back into a single Key-Value pair.</p> - -<p>Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will -always see all intended data that is generated.</p> - -<h2 id="final-things-to-remember">Final things to remember</h2> - -<p>Some simple recommendations/points to keep in mind:</p> - -<h3 id="method-call-order">Method call order</h3> - -<p>On an instance of an Iterator: <code class="highlighter-rouge">init</code> is always called before <code class="highlighter-rouge">seek</code>, <code class="highlighter-rouge">seek</code> is always called before <code class="highlighter-rouge">hasTop</code>, -<code class="highlighter-rouge">getTopKey</code> and <code class="highlighter-rouge">getTopValue</code> will not be called if <code class="highlighter-rouge">hasTop</code> returns false.</p> - -<h3 id="teardown">Teardown</h3> - -<p>As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex -collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value -pair is returned out of the âstackâ of Iterators (and added into the batch of Key-Values to be returned -to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and <code class="highlighter-rouge">deepCopy</code> -being called on the new instance with the old instance provided as the argument to <code class="highlighter-rouge">deepCopy</code>. References -to the old instance are removed and the object is lazily garbage collected by the JVM.</p> - -<h2 id="compaction-time-iterators">Compaction-time Iterators</h2> - -<p>When Iterators are configured to run during compactions, at the <code class="highlighter-rouge">minc</code> or <code class="highlighter-rouge">majc</code> scope, these Iterators sometimes need -to make different assertions than those who only operate at scan time. Iterators wonât see the delete entries; however, -Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite -all files (only a subset of them), it is possible that the logic take this into consideration.</p> - -<p>For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The -Combiner must recognize this and not perform any function that would be incorrect due -to the missing values.</p> - - </div> -</div> - - </div> - - -<footer> - - <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> - - <p>Copyright © 2011-2017 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p> - -</footer> - - - </div> - </div> - </div> -</body> -</html>
http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9ebc5f9a/docs/unreleased/development/iterator_testing.html ---------------------------------------------------------------------- diff --git a/docs/unreleased/development/iterator_testing.html b/docs/unreleased/development/iterator_testing.html deleted file mode 100644 index b6a8437..0000000 --- a/docs/unreleased/development/iterator_testing.html +++ /dev/null @@ -1,435 +0,0 @@ -<!DOCTYPE html> -<html lang="en"> -<head> -<!-- - Licensed to the Apache Software Foundation (ASF) under one or more - contributor license agreements. See the NOTICE file distributed with - this work for additional information regarding copyright ownership. - The ASF licenses this file to You under the Apache License, Version 2.0 - (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> -<meta charset="utf-8"> -<meta http-equiv="X-UA-Compatible" content="IE=edge"> -<meta name="viewport" content="width=device-width, initial-scale=1"> -<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> -<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> -<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> -<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> - -<title>Accumulo Documentation - Iterator Testing</title> - -<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> -<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> -<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> -<script> - // show location of canonical site if not currently on the canonical site - $(function() { - var host = window.location.host; - if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { - $('#non-canonical').show(); - } - }); - - $(function() { - // decorate section headers with anchors - return $("h2, h3, h4, h5, h6").each(function(i, el) { - var $el, icon, id; - $el = $(el); - id = $el.attr('id'); - icon = '<i class="fa fa-link"></i>'; - if (id) { - return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); - } - }); - }); - - // configure Google Analytics - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - - if (ga.hasOwnProperty('loaded') && ga.loaded === true) { - ga('create', 'UA-50934829-1', 'apache.org'); - ga('send', 'pageview'); - } -</script> - -</head> -<body style="padding-top: 100px"> - - <nav class="navbar navbar-default navbar-fixed-top"> - <div class="container"> - <div class="navbar-header"> - <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> - <span class="sr-only">Toggle navigation</span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - </button> - <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" - /></a> - </div> - <div class="collapse navbar-collapse" id="navbar-items"> - <ul class="nav navbar-nav"> - <li class="nav-link"><a href="/downloads">Download</a></li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/release/accumulo-1.8.1/">1.8.1 (Latest)</a></li> - <li><a href="/release/accumulo-1.7.3/">1.7.3</a></li> - <li><a href="/release/accumulo-1.6.6/">1.6.6</a></li> - <li><a href="/release/">Archive</a></li> - </ul> - </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/1.8/accumulo_user_manual.html">User Manual (1.8)</a></li> - <li><a href="/1.8/apidocs">Javadocs (1.8)</a></li> - <li><a href="/1.8/examples">Examples (1.8)</a></li> - <li><a href="/features">Features</a></li> - <li><a href="/glossary">Glossary</a></li> - <li><a href="/external-docs">External Docs</a></li> - <li><a href="/docs-archive/">Archive</a></li> - </ul> - </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/get_involved">Get Involved</a></li> - <li><a href="/mailing_list">Mailing Lists</a></li> - <li><a href="/people">People</a></li> - <li><a href="/related-projects">Related Projects</a></li> - <li><a href="/contributor/">Contributor Guide</a></li> - </ul> - </li> - </ul> - <ul class="nav navbar-nav navbar-right"> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#">Apache Software Foundation<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/licenses/LICENSE-2.0">License <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> - <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> - </ul> - </li> - </ul> - </div> - </div> -</nav> - - <div class="container"> - <div class="row"> - <div class="col-md-12"> - - <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> - Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> - </div> - <div id="content"> - - <div class="alert alert-danger" role="alert">This documentation is for an unreleased version of Apache Accumulo that is currently under development! Check out the <a href="/1.8/accumulo_user_manual.html">documentation for the latest release</a>.</div> - -<div class="row"> - <div class="col-md-3"> - <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true"> - <div class="panel panel-default"> - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsegetting-started" aria-expanded="false" aria-controls="collapsegetting-started"> - Getting started - </a> - </h4> - </div> - <div id="collapsegetting-started" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/design">Accumulo Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/clients">Accumulo Clients</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/shell">Accumulo Shell</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_design">Table Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_configuration">Table Configuration</a></div> - - </div> - </div> - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsedevelopment" aria-expanded="true" aria-controls="collapsedevelopment"> - Development - </a> - </h4> - </div> - <div id="collapsedevelopment" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterator_design">Iterator Design</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterator_testing">Iterator Testing</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/development_tools">Development Tools</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/sampling">Sampling</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/summaries">Summary Statistics</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/security">Security</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/high_speed_ingest">High-Speed Ingest</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/analytics">Analytics</a></div> - - </div> - </div> - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseadministration" aria-expanded="false" aria-controls="collapseadministration"> - Administration - </a> - </h4> - </div> - <div id="collapseadministration" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/installation">Installation</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-management">Configuration Management</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-properties">Configuration Properties</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/monitoring-metrics">Monitoring & Metrics</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/tracing">Tracing</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/fate">FATE</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/multivolume">Multi-Volume Installations</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/ssl">SSL</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/kerberos">Kerberos</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/replication">Replication</a></div> - - </div> - </div> - - - - - - - - - - - - - - - - - - - - - - <div class="panel-heading" role="tab" id="headingOne"> - <h4 class="panel-title"> - <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsetroubleshooting" aria-expanded="false" aria-controls="collapsetroubleshooting"> - Troubleshooting - </a> - </h4> - </div> - <div id="collapsetroubleshooting" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> - <div class="panel-body"> - - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/basic">Basic Troubleshooting</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/advanced">Advanced Troubleshooting</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/tools">Troubleshooting Tools</a></div> - - <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/system-metadata-tables">System Metadata Tables</a></div> - - </div> - </div> - - - - </div> - </div> - </div> - <div class="col-md-9"> - - <p><a href="/docs/unreleased/">Accumulo unreleased docs</a> >> Development >> Iterator Testing</p> - - - <h1>Iterator Testing</h1> - - <p>Iterators, while extremely powerful, are notoriously difficult to test. While the API defines -the methods an Iterator must implement and each methodâs functionality, the actual invocation -of these methods by Accumulo TabletServers can be surprisingly difficult to mimic in unit tests.</p> - -<p>The Apache Accumulo âIterator Test Harnessâ is designed to provide a generalized testing framework -for all Accumulo Iterators to leverage to identify common pitfalls in user-created Iterators.</p> - -<h2 id="framework-use">Framework Use</h2> - -<p>The harness provides an abstract class for use with JUnit4. Users must define the following for this -abstract class:</p> - -<ul> - <li>A <code class="highlighter-rouge">SortedMap</code> of input data (<code class="highlighter-rouge">Key</code>-<code class="highlighter-rouge">Value</code> pairs)</li> - <li>A <code class="highlighter-rouge">Range</code> to use in tests</li> - <li>A <code class="highlighter-rouge">Map</code> of options (<code class="highlighter-rouge">String</code> to <code class="highlighter-rouge">String</code> pairs)</li> - <li>A <code class="highlighter-rouge">SortedMap</code> of output data (<code class="highlighter-rouge">Key</code>-<code class="highlighter-rouge">Value</code> pairs)</li> - <li>A list of <code class="highlighter-rouge">IteratorTestCase</code>s (these can be automatically discovered)</li> -</ul> - -<p>The majority of effort a user must make is in creating the input dataset and the expected -output dataset for the iterator being tested.</p> - -<h2 id="normal-test-outline">Normal Test Outline</h2> - -<p>Most iterator tests will follow the given outline:</p> - -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">java.util.SortedMap</span><span class="o">;</span> - -<span class="kn">import</span> <span class="nn">org.apache.accumulo.core.data.Key</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.core.data.Range</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.core.data.Value</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.iteratortest.IteratorTestCaseFinder</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.iteratortest.IteratorTestInput</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.iteratortest.IteratorTestOutput</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.iteratortest.junit4.BaseJUnit4IteratorTest</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.apache.accumulo.iteratortest.testcases.IteratorTestCase</span><span class="o">;</span> -<span class="kn">import</span> <span class="nn">org.junit.runners.Parameterized.Parameters</span><span class="o">;</span> - -<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyIteratorTest</span> <span class="kd">extends</span> <span class="n">BaseJUnit4IteratorTest</span> <span class="o">{</span> - - <span class="nd">@Parameters</span> - <span class="kd">public</span> <span class="kd">static</span> <span class="n">Object</span><span class="o">[][]</span> <span class="nf">parameters</span><span class="o">()</span> <span class="o">{</span> - <span class="kd">final</span> <span class="n">IteratorTestInput</span> <span class="n">input</span> <span class="o">=</span> <span class="n">createIteratorInput</span><span class="o">();</span> - <span class="kd">final</span> <span class="n">IteratorTestOutput</span> <span class="n">output</span> <span class="o">=</span> <span class="n">createIteratorOutput</span><span class="o">();</span> - <span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">IteratorTestCase</span><span class="o">></span> <span class="n">testCases</span> <span class="o">=</span> <span class="n">IteratorTestCaseFinder</span><span class="o">.</span><span class="na">findAllTestCases</span><span class="o">();</span> - <span class="k">return</span> <span class="n">BaseJUnit4IteratorTest</span><span class="o">.</span><span class="na">createParameters</span><span class="o">(</span><span class="n">input</span><span class="o">,</span> <span class="n">output</span><span class="o">,</span> <span class="n">tests</span><span class="o">);</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">SortedMap</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="n">INPUT_DATA</span> <span class="o">=</span> <span class="n">createInputData</span><span class="o">();</span> - <span class="kd">private</span> <span class="kd">static</span> <span class="n">SortedMap</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="n">OUTPUT_DATA</span> <span class="o">=</span> <span class="n">createOutputData</span><span class="o">();</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">SortedMap</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="nf">createInputData</span><span class="o">()</span> <span class="o">{</span> - <span class="c1">// TODO -- implement this method</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">SortedMap</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">></span> <span class="nf">createOutputData</span><span class="o">()</span> <span class="o">{</span> - <span class="c1">// TODO -- implement this method</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">IteratorTestInput</span> <span class="nf">createIteratorInput</span><span class="o">()</span> <span class="o">{</span> - <span class="kd">final</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">></span> <span class="n">options</span> <span class="o">=</span> <span class="n">createIteratorOptions</span><span class="o">();</span> - <span class="kd">final</span> <span class="n">Range</span> <span class="n">range</span> <span class="o">=</span> <span class="n">createRange</span><span class="o">();</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">IteratorTestInput</span><span class="o">(</span><span class="n">MyIterator</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">options</span><span class="o">,</span> <span class="n">range</span><span class="o">,</span> <span class="n">INPUT_DATA</span><span class="o">);</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">></span> <span class="nf">createIteratorOptions</span><span class="o">()</span> <span class="o">{</span> - <span class="c1">// TODO -- implement this method</span> - <span class="c1">// Tip: Use INPUT_DATA if helpful in generating output</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">Range</span> <span class="nf">createRange</span><span class="o">()</span> <span class="o">{</span> - <span class="c1">// TODO -- implement this method</span> - <span class="o">}</span> - - <span class="kd">private</span> <span class="kd">static</span> <span class="n">IteratorTestOutput</span> <span class="nf">createIteratorOutput</span><span class="o">()</span> <span class="o">{</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">IteratorTestOutput</span><span class="o">(</span><span class="n">OUTPUT_DATA</span><span class="o">);</span> - <span class="o">}</span> -<span class="o">}</span> -</code></pre> -</div> - -<h2 id="limitations">Limitations</h2> - -<p>While the provided <code class="highlighter-rouge">IteratorTestCase</code>s should exercise common edge-cases in user iterators, -there are still many limitations to the existing test harness. Some of them are:</p> - -<ul> - <li>Can only specify a single iterator, not many (a âstackâ)</li> - <li>No control over provided IteratorEnvironment for tests</li> - <li>Exercising delete keys (especially with major compactions that do not include all files)</li> -</ul> - -<p>These are left as future improvements to the harness.</p> - - </div> -</div> - - </div> - - -<footer> - - <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> - - <p>Copyright © 2011-2017 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p> - -</footer> - - - </div> - </div> - </div> -</body> -</html>
