http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/eab65f94/docs/unreleased/development/mapreduce.html ---------------------------------------------------------------------- diff --git a/docs/unreleased/development/mapreduce.html b/docs/unreleased/development/mapreduce.html new file mode 100644 index 0000000..e327ccc --- /dev/null +++ b/docs/unreleased/development/mapreduce.html @@ -0,0 +1,532 @@ +<!DOCTYPE html> +<html lang="en"> +<head> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<meta charset="utf-8"> +<meta http-equiv="X-UA-Compatible" content="IE=edge"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> +<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> +<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> +<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> + +<title>Accumulo Documentation - MapReduce</title> + +<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> +<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> +<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> +<script> + // show location of canonical site if not currently on the canonical site + $(function() { + var host = window.location.host; + if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { + $('#non-canonical').show(); + } + }); + + $(function() { + // decorate section headers with anchors + return $("h2, h3, h4, h5, h6").each(function(i, el) { + var $el, icon, id; + $el = $(el); + id = $el.attr('id'); + icon = '<i class="fa fa-link"></i>'; + if (id) { + return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); + } + }); + }); + + // fix sidebar width in documentation + $(function() { + var $affixElement = $('div[data-spy="affix"]'); + $affixElement.width($affixElement.parent().width()); + }); + + // configure Google Analytics + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + if (ga.hasOwnProperty('loaded') && ga.loaded === true) { + ga('create', 'UA-50934829-1', 'apache.org'); + ga('send', 'pageview'); + } +</script> + +</head> +<body style="padding-top: 100px"> + + <nav class="navbar navbar-default navbar-fixed-top"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" + /></a> + </div> + <div class="collapse navbar-collapse" id="navbar-items"> + <ul class="nav navbar-nav"> + <li class="nav-link"><a href="/downloads">Download</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/release/accumulo-1.8.1/">1.8.1 (Latest)</a></li> + <li><a href="/release/accumulo-1.7.3/">1.7.3</a></li> + <li><a href="/release/accumulo-1.6.6/">1.6.6</a></li> + <li><a href="/release/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/1.8/accumulo_user_manual.html">User Manual (1.8)</a></li> + <li><a href="/1.8/apidocs">Javadocs (1.8)</a></li> + <li><a href="/1.8/examples">Examples (1.8)</a></li> + <li><a href="/features">Features</a></li> + <li><a href="/glossary">Glossary</a></li> + <li><a href="/external-docs">External Docs</a></li> + <li><a href="/docs-archive/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/get_involved">Get Involved</a></li> + <li><a href="/mailing_list">Mailing Lists</a></li> + <li><a href="/people">People</a></li> + <li><a href="/related-projects">Related Projects</a></li> + <li><a href="/contributor/">Contributor Guide</a></li> + </ul> + </li> + </ul> + <ul class="nav navbar-nav navbar-right"> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Apache Software Foundation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/licenses/LICENSE-2.0">License <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> + </ul> + </li> + </ul> + </div> + </div> +</nav> + + <div class="container"> + <div class="row"> + <div class="col-md-12"> + + <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> + Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> + </div> + <div id="content"> + + <div class="row"> + <div class="col-md-3"> + <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true" data-spy="affix"> + <div class="panel panel-default"> + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsegetting-started" aria-expanded="false" aria-controls="collapsegetting-started"> + Getting started + </a> + </h4> + </div> + <div id="collapsegetting-started" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/design">Accumulo Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/quick-install">Quick Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/clients">Accumulo Clients</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/shell">Accumulo Shell</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_design">Table Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_configuration">Table Configuration</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsedevelopment" aria-expanded="true" aria-controls="collapsedevelopment"> + Development + </a> + </h4> + </div> + <div id="collapsedevelopment" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterators">Iterators</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/mapreduce">MapReduce</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/proxy">Proxy</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/development_tools">Development Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/sampling">Sampling</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/summaries">Summary Statistics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/security">Security</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/high_speed_ingest">High-Speed Ingest</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseadministration" aria-expanded="false" aria-controls="collapseadministration"> + Administration + </a> + </h4> + </div> + <div id="collapseadministration" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/in-depth-install">In-depth Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-management">Configuration Management</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-properties">Configuration Properties</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/monitoring-metrics">Monitoring & Metrics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/tracing">Tracing</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/fate">FATE</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/multivolume">Multi-Volume Installations</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/ssl">SSL</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/kerberos">Kerberos</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/replication">Replication</a></div> + + </div> + </div> + + + + + + + + + + + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsetroubleshooting" aria-expanded="false" aria-controls="collapsetroubleshooting"> + Troubleshooting + </a> + </h4> + </div> + <div id="collapsetroubleshooting" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/basic">Basic Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/advanced">Advanced Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/tools">Troubleshooting Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/system-metadata-tables">System Metadata Tables</a></div> + + </div> + </div> + + + + </div> + </div> + </div> + <div class="col-md-9"> + + <p><a href="/docs/unreleased/">Accumulo unreleased docs</a> >> Development >> MapReduce</p> + + + + <div class="alert alert-danger" style="margin-bottom: 0px;" role="alert">This documentation is for a future release of Accumulo! <a href="/1.8/accumulo_user_manual.html">View documentation for the latest release</a>.</div> + + <div class="row"> + <div class="col-md-10"><h1>MapReduce</h1></div> + <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/mapreduce.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + + <p>Accumulo tables can be used as the source and destination of MapReduce jobs. To +use an Accumulo table with a MapReduce job, configure the job parameters to use +the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html">AccumuloInputFormat</a> and <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloOutputFormat.html">AccumuloOutputFormat</a>. Accumulo specific parameters +can be set via these two format classes to do the following:</p> + +<ul> + <li>Authenticate and provide user credentials for the input</li> + <li>Restrict the scan to a range of rows</li> + <li>Restrict the input to a subset of available columns</li> +</ul> + +<h2 id="mapper-and-reducer-classes">Mapper and Reducer classes</h2> + +<p>To read from an Accumulo table create a Mapper with the following class +parameterization and be sure to configure the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html">AccumuloInputFormat</a>.</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">MyMapper</span> <span class="kd">extends</span> <span class="n">Mapper</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">,</span><span class="n">WritableComparable</span><span class="o">,</span><span class="n">Writable</span><span class="o">></span> <span class="o">{</span> + <span class="kd">public</span> <span class="kt">void</span> <span class="nf">map</span><span class="o">(</span><span class="n">Key</span> <span class="n">k</span><span class="o">,</span> <span class="n">Value</span> <span class="n">v</span><span class="o">,</span> <span class="n">Context</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> + <span class="c1">// transform key and value data here</span> + <span class="o">}</span> +<span class="o">}</span> +</code></pre> +</div> + +<p>To write to an Accumulo table, create a Reducer with the following class +parameterization and be sure to configure the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloOutputFormat.html">AccumuloOutputFormat</a>. The key +emitted from the Reducer identifies the table to which the mutation is sent. This +allows a single Reducer to write to more than one table if desired. A default table +can be configured using the AccumuloOutputFormat, in which case the output table +name does not have to be passed to the Context object within the Reducer.</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">MyReducer</span> <span class="kd">extends</span> <span class="n">Reducer</span><span class="o"><</span><span class="n">WritableComparable</span><span class="o">,</span> <span class="n">Writable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Mutation</span><span class="o">></span> <span class="o">{</span> + <span class="kd">public</span> <span class="kt">void</span> <span class="nf">reduce</span><span class="o">(</span><span class="n">WritableComparable</span> <span class="n">key</span><span class="o">,</span> <span class="n">Iterable</span><span class="o"><</span><span class="n">Text</span><span class="o">></span> <span class="n">values</span><span class="o">,</span> <span class="n">Context</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> + <span class="n">Mutation</span> <span class="n">m</span><span class="o">;</span> + <span class="c1">// create the mutation based on input key and value</span> + <span class="n">c</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="k">new</span> <span class="n">Text</span><span class="o">(</span><span class="s">"output-table"</span><span class="o">),</span> <span class="n">m</span><span class="o">);</span> + <span class="o">}</span> +<span class="o">}</span> +</code></pre> +</div> + +<p>The Text object passed as the output should contain the name of the table to which +this mutation should be applied. The Text can be null in which case the mutation +will be applied to the default table name specified in the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloOutputFormat.html">AccumuloOutputFormat</a> +options.</p> + +<h2 id="accumuloinputformat-options">AccumuloInputFormat options</h2> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Job</span> <span class="n">job</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Job</span><span class="o">(</span><span class="n">getConf</span><span class="o">());</span> +<span class="n">AccumuloInputFormat</span><span class="o">.</span><span class="na">setInputInfo</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> + <span class="s">"user"</span><span class="o">,</span> + <span class="s">"passwd"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(),</span> + <span class="s">"table"</span><span class="o">,</span> + <span class="k">new</span> <span class="nf">Authorizations</span><span class="o">());</span> + +<span class="n">AccumuloInputFormat</span><span class="o">.</span><span class="na">setZooKeeperInstance</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="s">"myinstance"</span><span class="o">,</span> + <span class="s">"zooserver-one,zooserver-two"</span><span class="o">);</span> +</code></pre> +</div> + +<p><strong>Optional Settings:</strong></p> + +<p>To restrict Accumulo to a set of row ranges:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">></span> <span class="n">ranges</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">>();</span> +<span class="c1">// populate array list of row ranges ...</span> +<span class="n">AccumuloInputFormat</span><span class="o">.</span><span class="na">setRanges</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="n">ranges</span><span class="o">);</span> +</code></pre> +</div> + +<p>To restrict Accumulo to a list of columns:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>></span> <span class="n">columns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>>();</span> +<span class="c1">// populate list of columns</span> +<span class="n">AccumuloInputFormat</span><span class="o">.</span><span class="na">fetchColumns</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="n">columns</span><span class="o">);</span> +</code></pre> +</div> + +<p>To use a regular expression to match row IDs:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">IteratorSetting</span> <span class="n">is</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IteratorSetting</span><span class="o">(</span><span class="mi">30</span><span class="o">,</span> <span class="n">RexExFilter</span><span class="o">.</span><span class="na">class</span><span class="o">);</span> +<span class="n">RegExFilter</span><span class="o">.</span><span class="na">setRegexs</span><span class="o">(</span><span class="n">is</span><span class="o">,</span> <span class="s">".*suffix"</span><span class="o">,</span> <span class="kc">null</span><span class="o">,</span> <span class="kc">null</span><span class="o">,</span> <span class="kc">null</span><span class="o">,</span> <span class="kc">true</span><span class="o">);</span> +<span class="n">AccumuloInputFormat</span><span class="o">.</span><span class="na">addIterator</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="n">is</span><span class="o">);</span> +</code></pre> +</div> + +<h2 id="accumulomultitableinputformat-options">AccumuloMultiTableInputFormat options</h2> + +<p>The <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloMultiTableInputFormat.html">AccumuloMultiTableInputFormat</a> allows the scanning over multiple tables +in a single MapReduce job. Separate ranges, columns, and iterators can be +used for each table.</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">InputTableConfig</span> <span class="n">tableOneConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">InputTableConfig</span><span class="o">();</span> +<span class="n">InputTableConfig</span> <span class="n">tableTwoConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">InputTableConfig</span><span class="o">();</span> +</code></pre> +</div> + +<p>To set the configuration objects on the job:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">InputTableConfig</span><span class="o">></span> <span class="n">configs</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">InputTableConfig</span><span class="o">>();</span> +<span class="n">configs</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"table1"</span><span class="o">,</span> <span class="n">tableOneConfig</span><span class="o">);</span> +<span class="n">configs</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"table2"</span><span class="o">,</span> <span class="n">tableTwoConfig</span><span class="o">);</span> +<span class="n">AccumuloMultiTableInputFormat</span><span class="o">.</span><span class="na">setInputTableConfigs</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="n">configs</span><span class="o">);</span> +</code></pre> +</div> + +<p><strong>Optional settings:</strong></p> + +<p>To restrict to a set of ranges:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">></span> <span class="n">tableOneRanges</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">>();</span> +<span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">></span> <span class="n">tableTwoRanges</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Range</span><span class="o">>();</span> +<span class="c1">// populate array lists of row ranges for tables...</span> +<span class="n">tableOneConfig</span><span class="o">.</span><span class="na">setRanges</span><span class="o">(</span><span class="n">tableOneRanges</span><span class="o">);</span> +<span class="n">tableTwoConfig</span><span class="o">.</span><span class="na">setRanges</span><span class="o">(</span><span class="n">tableTwoRanges</span><span class="o">);</span> +</code></pre> +</div> + +<p>To restrict Accumulo to a list of columns:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>></span> <span class="n">tableOneColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>>();</span> +<span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>></span> <span class="n">tableTwoColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">Pair</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span><span class="n">Text</span><span class="o">>>();</span> +<span class="c1">// populate lists of columns for each of the tables ...</span> +<span class="n">tableOneConfig</span><span class="o">.</span><span class="na">fetchColumns</span><span class="o">(</span><span class="n">tableOneColumns</span><span class="o">);</span> +<span class="n">tableTwoConfig</span><span class="o">.</span><span class="na">fetchColumns</span><span class="o">(</span><span class="n">tableTwoColumns</span><span class="o">);</span> +</code></pre> +</div> + +<p>To set scan iterators:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">List</span><span class="o"><</span><span class="n">IteratorSetting</span><span class="o">></span> <span class="n">tableOneIterators</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">IteratorSetting</span><span class="o">>();</span> +<span class="n">List</span><span class="o"><</span><span class="n">IteratorSetting</span><span class="o">></span> <span class="n">tableTwoIterators</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">IteratorSetting</span><span class="o">>();</span> +<span class="c1">// populate the lists of iterator settings for each of the tables ...</span> +<span class="n">tableOneConfig</span><span class="o">.</span><span class="na">setIterators</span><span class="o">(</span><span class="n">tableOneIterators</span><span class="o">);</span> +<span class="n">tableTwoConfig</span><span class="o">.</span><span class="na">setIterators</span><span class="o">(</span><span class="n">tableTwoIterators</span><span class="o">);</span> +</code></pre> +</div> + +<p>The name of the table can be retrieved from the input split:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">MyMapper</span> <span class="kd">extends</span> <span class="n">Mapper</span><span class="o"><</span><span class="n">Key</span><span class="o">,</span><span class="n">Value</span><span class="o">,</span><span class="n">WritableComparable</span><span class="o">,</span><span class="n">Writable</span><span class="o">></span> <span class="o">{</span> + <span class="kd">public</span> <span class="kt">void</span> <span class="nf">map</span><span class="o">(</span><span class="n">Key</span> <span class="n">k</span><span class="o">,</span> <span class="n">Value</span> <span class="n">v</span><span class="o">,</span> <span class="n">Context</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> + <span class="n">RangeInputSplit</span> <span class="n">split</span> <span class="o">=</span> <span class="o">(</span><span class="n">RangeInputSplit</span><span class="o">)</span><span class="n">c</span><span class="o">.</span><span class="na">getInputSplit</span><span class="o">();</span> + <span class="n">String</span> <span class="n">tableName</span> <span class="o">=</span> <span class="n">split</span><span class="o">.</span><span class="na">getTableName</span><span class="o">();</span> + <span class="c1">// do something with table name</span> + <span class="o">}</span> +<span class="o">}</span> +</code></pre> +</div> + +<h2 id="accumulooutputformat-options">AccumuloOutputFormat options</h2> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kt">boolean</span> <span class="n">createTables</span> <span class="o">=</span> <span class="kc">true</span><span class="o">;</span> +<span class="n">String</span> <span class="n">defaultTable</span> <span class="o">=</span> <span class="s">"mytable"</span><span class="o">;</span> + +<span class="n">AccumuloOutputFormat</span><span class="o">.</span><span class="na">setOutputInfo</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> + <span class="s">"user"</span><span class="o">,</span> + <span class="s">"passwd"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(),</span> + <span class="n">createTables</span><span class="o">,</span> + <span class="n">defaultTable</span><span class="o">);</span> + +<span class="n">AccumuloOutputFormat</span><span class="o">.</span><span class="na">setZooKeeperInstance</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="s">"myinstance"</span><span class="o">,</span> + <span class="s">"zooserver-one,zooserver-two"</span><span class="o">);</span> +</code></pre> +</div> + +<p><strong>Optional Settings:</strong></p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">AccumuloOutputFormat</span><span class="o">.</span><span class="na">setMaxLatency</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="mi">300000</span><span class="o">);</span> <span class="c1">// milliseconds</span> +<span class="n">AccumuloOutputFormat</span><span class="o">.</span><span class="na">setMaxMutationBufferSize</span><span class="o">(</span><span class="n">job</span><span class="o">,</span> <span class="mi">50000000</span><span class="o">);</span> <span class="c1">// bytes</span> +</code></pre> +</div> + +<p>The <a href="https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md">MapReduce example</a> contains a complete example of using MapReduce with Accumulo.</p> + + + + <div class="row" style="margin-top: 20px;"> + <div class="col-md-10"><strong>Find documentation for all releases in the <a href="/docs-archive">archive</strong></div> + <div class="col-md-2"><a class="pull-right" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/mapreduce.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + </div> +</div> + + </div> + + +<footer> + + <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> + + <p>Copyright © 2011-2017 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p> + +</footer> + + + </div> + </div> + </div> +</body> +</html>
http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/eab65f94/docs/unreleased/development/proxy.html ---------------------------------------------------------------------- diff --git a/docs/unreleased/development/proxy.html b/docs/unreleased/development/proxy.html new file mode 100644 index 0000000..5b5229d --- /dev/null +++ b/docs/unreleased/development/proxy.html @@ -0,0 +1,474 @@ +<!DOCTYPE html> +<html lang="en"> +<head> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<meta charset="utf-8"> +<meta http-equiv="X-UA-Compatible" content="IE=edge"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> +<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> +<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> +<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> + +<title>Accumulo Documentation - Proxy</title> + +<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> +<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> +<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> +<script> + // show location of canonical site if not currently on the canonical site + $(function() { + var host = window.location.host; + if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { + $('#non-canonical').show(); + } + }); + + $(function() { + // decorate section headers with anchors + return $("h2, h3, h4, h5, h6").each(function(i, el) { + var $el, icon, id; + $el = $(el); + id = $el.attr('id'); + icon = '<i class="fa fa-link"></i>'; + if (id) { + return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); + } + }); + }); + + // fix sidebar width in documentation + $(function() { + var $affixElement = $('div[data-spy="affix"]'); + $affixElement.width($affixElement.parent().width()); + }); + + // configure Google Analytics + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + if (ga.hasOwnProperty('loaded') && ga.loaded === true) { + ga('create', 'UA-50934829-1', 'apache.org'); + ga('send', 'pageview'); + } +</script> + +</head> +<body style="padding-top: 100px"> + + <nav class="navbar navbar-default navbar-fixed-top"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" + /></a> + </div> + <div class="collapse navbar-collapse" id="navbar-items"> + <ul class="nav navbar-nav"> + <li class="nav-link"><a href="/downloads">Download</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/release/accumulo-1.8.1/">1.8.1 (Latest)</a></li> + <li><a href="/release/accumulo-1.7.3/">1.7.3</a></li> + <li><a href="/release/accumulo-1.6.6/">1.6.6</a></li> + <li><a href="/release/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/1.8/accumulo_user_manual.html">User Manual (1.8)</a></li> + <li><a href="/1.8/apidocs">Javadocs (1.8)</a></li> + <li><a href="/1.8/examples">Examples (1.8)</a></li> + <li><a href="/features">Features</a></li> + <li><a href="/glossary">Glossary</a></li> + <li><a href="/external-docs">External Docs</a></li> + <li><a href="/docs-archive/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/get_involved">Get Involved</a></li> + <li><a href="/mailing_list">Mailing Lists</a></li> + <li><a href="/people">People</a></li> + <li><a href="/related-projects">Related Projects</a></li> + <li><a href="/contributor/">Contributor Guide</a></li> + </ul> + </li> + </ul> + <ul class="nav navbar-nav navbar-right"> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Apache Software Foundation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/licenses/LICENSE-2.0">License <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> + </ul> + </li> + </ul> + </div> + </div> +</nav> + + <div class="container"> + <div class="row"> + <div class="col-md-12"> + + <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> + Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> + </div> + <div id="content"> + + <div class="row"> + <div class="col-md-3"> + <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true" data-spy="affix"> + <div class="panel panel-default"> + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsegetting-started" aria-expanded="false" aria-controls="collapsegetting-started"> + Getting started + </a> + </h4> + </div> + <div id="collapsegetting-started" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/design">Accumulo Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/quick-install">Quick Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/clients">Accumulo Clients</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/shell">Accumulo Shell</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_design">Table Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_configuration">Table Configuration</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsedevelopment" aria-expanded="true" aria-controls="collapsedevelopment"> + Development + </a> + </h4> + </div> + <div id="collapsedevelopment" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterators">Iterators</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/mapreduce">MapReduce</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/proxy">Proxy</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/development_tools">Development Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/sampling">Sampling</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/summaries">Summary Statistics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/security">Security</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/high_speed_ingest">High-Speed Ingest</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseadministration" aria-expanded="false" aria-controls="collapseadministration"> + Administration + </a> + </h4> + </div> + <div id="collapseadministration" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/in-depth-install">In-depth Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-management">Configuration Management</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-properties">Configuration Properties</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/monitoring-metrics">Monitoring & Metrics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/tracing">Tracing</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/fate">FATE</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/multivolume">Multi-Volume Installations</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/ssl">SSL</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/kerberos">Kerberos</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/replication">Replication</a></div> + + </div> + </div> + + + + + + + + + + + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsetroubleshooting" aria-expanded="false" aria-controls="collapsetroubleshooting"> + Troubleshooting + </a> + </h4> + </div> + <div id="collapsetroubleshooting" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/basic">Basic Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/advanced">Advanced Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/tools">Troubleshooting Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/system-metadata-tables">System Metadata Tables</a></div> + + </div> + </div> + + + + </div> + </div> + </div> + <div class="col-md-9"> + + <p><a href="/docs/unreleased/">Accumulo unreleased docs</a> >> Development >> Proxy</p> + + + + <div class="alert alert-danger" style="margin-bottom: 0px;" role="alert">This documentation is for a future release of Accumulo! <a href="/1.8/accumulo_user_manual.html">View documentation for the latest release</a>.</div> + + <div class="row"> + <div class="col-md-10"><h1>Proxy</h1></div> + <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/proxy.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + + <p>The proxy API allows the interaction with Accumulo with languages other than Java. +A proxy server is provided in the codebase and a client can further be generated. +The proxy API can also be used instead of the traditional <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/ZooKeeperInstance.html">ZooKeeperInstance</a> class to +provide a single TCP port in which clients can be securely routed through a firewall, +without requiring access to all tablet servers in the cluster.</p> + +<h2 id="prerequisites">Prerequisites</h2> + +<p>The proxy server can live on any node in which the basic client API would work. That +means it must be able to communicate with the Master, ZooKeepers, NameNode, and the +DataNodes. A proxy client only needs the ability to communicate with the proxy server.</p> + +<h2 id="configuration">Configuration</h2> + +<p>The configuration options for the proxy server live inside of a properties file. At +the very least, you need to supply the following properties:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory +tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken +port=42424 +instance=test +zookeepers=localhost:2181 +</code></pre> +</div> + +<p>You can find a sample configuration file in your distribution at <code class="highlighter-rouge">proxy/proxy.properties</code>.</p> + +<p>This sample configuration file further demonstrates an ability to back the proxy server +by MiniAccumuloCluster.</p> + +<h2 id="running-the-proxy-server">Running the Proxy Server</h2> + +<p>After the properties file holding the configuration is created, the proxy server +can be started using the following command in the Accumulo distribution (assuming +your properties file is named <code class="highlighter-rouge">config.properties</code>):</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>accumulo proxy -p config.properties +</code></pre> +</div> + +<h2 id="creating-a-proxy-client">Creating a Proxy Client</h2> + +<p>Aside from installing the Thrift compiler, you will also need the language-specific library +for Thrift installed to generate client code in that language. Typically, your operating +systemâs package manager will be able to automatically install these for you in an expected +location such as <code class="highlighter-rouge">/usr/lib/python/site-packages/thrift</code>.</p> + +<p>You can find the thrift file for generating the client at <code class="highlighter-rouge">proxy/proxy.thrift</code>.</p> + +<p>After a client is generated, the port specified in the configuration properties above will be +used to connect to the server.</p> + +<h2 id="using-a-proxy-client">Using a Proxy Client</h2> + +<p>The following examples have been written in Java and the method signatures may be +slightly different depending on the language specified when generating client with +the Thrift compiler. After initiating a connection to the Proxy (see Apache Thriftâs +documentation for examples of connecting to a Thrift service), the methods on the +proxy client will be available. The first thing to do is log in:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Map</span> <span class="n">password</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">>();</span> +<span class="n">password</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"password"</span><span class="o">,</span> <span class="s">"secret"</span><span class="o">);</span> +<span class="n">ByteBuffer</span> <span class="n">token</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">login</span><span class="o">(</span><span class="s">"root"</span><span class="o">,</span> <span class="n">password</span><span class="o">);</span> +</code></pre> +</div> + +<p>Once logged in, the token returned will be used for most subsequent calls to the client. +Letâs create a table, add some data, scan the table, and delete it.</p> + +<p>First, create a table.</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">client</span><span class="o">.</span><span class="na">createTable</span><span class="o">(</span><span class="n">token</span><span class="o">,</span> <span class="s">"myTable"</span><span class="o">,</span> <span class="kc">true</span><span class="o">,</span> <span class="n">TimeType</span><span class="o">.</span><span class="na">MILLIS</span><span class="o">);</span> +</code></pre> +</div> + +<p>Next, add some data:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// first, create a writer on the server</span> +<span class="n">String</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">createWriter</span><span class="o">(</span><span class="n">token</span><span class="o">,</span> <span class="s">"myTable"</span><span class="o">,</span> <span class="k">new</span> <span class="n">WriterOptions</span><span class="o">());</span> + +<span class="c1">//rowid</span> +<span class="n">ByteBuffer</span> <span class="n">rowid</span> <span class="o">=</span> <span class="n">ByteBuffer</span><span class="o">.</span><span class="na">wrap</span><span class="o">(</span><span class="s">"UUID"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> + +<span class="c1">//mutation like class</span> +<span class="n">ColumnUpdate</span> <span class="n">cu</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ColumnUpdate</span><span class="o">();</span> +<span class="n">cu</span><span class="o">.</span><span class="na">setColFamily</span><span class="o">(</span><span class="s">"MyFamily"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> +<span class="n">cu</span><span class="o">.</span><span class="na">setColQualifier</span><span class="o">(</span><span class="s">"MyQualifier"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> +<span class="n">cu</span><span class="o">.</span><span class="na">setColVisibility</span><span class="o">(</span><span class="s">"VisLabel"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> +<span class="n">cu</span><span class="o">.</span><span class="na">setValue</span><span class="o">(</span><span class="s">"Some Value."</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> + +<span class="n">List</span><span class="o"><</span><span class="n">ColumnUpdate</span><span class="o">></span> <span class="n">updates</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><</span><span class="n">ColumnUpdate</span><span class="o">>();</span> +<span class="n">updates</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">cu</span><span class="o">);</span> + +<span class="c1">// build column updates</span> +<span class="n">Map</span><span class="o"><</span><span class="n">ByteBuffer</span><span class="o">,</span> <span class="n">List</span><span class="o"><</span><span class="n">ColumnUpdate</span><span class="o">>></span> <span class="n">cellsToUpdate</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><</span><span class="n">ByteBuffer</span><span class="o">,</span> <span class="n">List</span><span class="o"><</span><span class="n">ColumnUpdate</span><span class="o">>>();</span> +<span class="n">cellsToUpdate</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">rowid</span><span class="o">,</span> <span class="n">updates</span><span class="o">);</span> + +<span class="c1">// send updates to the server</span> +<span class="n">client</span><span class="o">.</span><span class="na">updateAndFlush</span><span class="o">(</span><span class="n">writer</span><span class="o">,</span> <span class="s">"myTable"</span><span class="o">,</span> <span class="n">cellsToUpdate</span><span class="o">);</span> + +<span class="n">client</span><span class="o">.</span><span class="na">closeWriter</span><span class="o">(</span><span class="n">writer</span><span class="o">);</span> +</code></pre> +</div> + +<p>Scan for the data and batch the return of the results on the server:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">String</span> <span class="n">scanner</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">createScanner</span><span class="o">(</span><span class="n">token</span><span class="o">,</span> <span class="s">"myTable"</span><span class="o">,</span> <span class="k">new</span> <span class="n">ScanOptions</span><span class="o">());</span> +<span class="n">ScanResult</span> <span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">nextK</span><span class="o">(</span><span class="n">scanner</span><span class="o">,</span> <span class="mi">100</span><span class="o">);</span> + +<span class="k">for</span><span class="o">(</span><span class="n">KeyValue</span> <span class="n">keyValue</span> <span class="o">:</span> <span class="n">results</span><span class="o">.</span><span class="na">getResultsIterator</span><span class="o">())</span> <span class="o">{</span> + <span class="c1">// do something with results</span> +<span class="o">}</span> + +<span class="n">client</span><span class="o">.</span><span class="na">closeScanner</span><span class="o">(</span><span class="n">scanner</span><span class="o">);</span> +</code></pre> +</div> + + + + <div class="row" style="margin-top: 20px;"> + <div class="col-md-10"><strong>Find documentation for all releases in the <a href="/docs-archive">archive</strong></div> + <div class="col-md-2"><a class="pull-right" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/proxy.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + </div> +</div> + + </div> + + +<footer> + + <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> + + <p>Copyright © 2011-2017 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p> + +</footer> + + + </div> + </div> + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/eab65f94/docs/unreleased/development/sampling.html ---------------------------------------------------------------------- diff --git a/docs/unreleased/development/sampling.html b/docs/unreleased/development/sampling.html new file mode 100644 index 0000000..3b5277e --- /dev/null +++ b/docs/unreleased/development/sampling.html @@ -0,0 +1,425 @@ +<!DOCTYPE html> +<html lang="en"> +<head> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<meta charset="utf-8"> +<meta http-equiv="X-UA-Compatible" content="IE=edge"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> +<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> +<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> +<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> + +<title>Accumulo Documentation - Sampling</title> + +<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> +<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> +<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> +<script> + // show location of canonical site if not currently on the canonical site + $(function() { + var host = window.location.host; + if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { + $('#non-canonical').show(); + } + }); + + $(function() { + // decorate section headers with anchors + return $("h2, h3, h4, h5, h6").each(function(i, el) { + var $el, icon, id; + $el = $(el); + id = $el.attr('id'); + icon = '<i class="fa fa-link"></i>'; + if (id) { + return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); + } + }); + }); + + // fix sidebar width in documentation + $(function() { + var $affixElement = $('div[data-spy="affix"]'); + $affixElement.width($affixElement.parent().width()); + }); + + // configure Google Analytics + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + if (ga.hasOwnProperty('loaded') && ga.loaded === true) { + ga('create', 'UA-50934829-1', 'apache.org'); + ga('send', 'pageview'); + } +</script> + +</head> +<body style="padding-top: 100px"> + + <nav class="navbar navbar-default navbar-fixed-top"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" + /></a> + </div> + <div class="collapse navbar-collapse" id="navbar-items"> + <ul class="nav navbar-nav"> + <li class="nav-link"><a href="/downloads">Download</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/release/accumulo-1.8.1/">1.8.1 (Latest)</a></li> + <li><a href="/release/accumulo-1.7.3/">1.7.3</a></li> + <li><a href="/release/accumulo-1.6.6/">1.6.6</a></li> + <li><a href="/release/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/1.8/accumulo_user_manual.html">User Manual (1.8)</a></li> + <li><a href="/1.8/apidocs">Javadocs (1.8)</a></li> + <li><a href="/1.8/examples">Examples (1.8)</a></li> + <li><a href="/features">Features</a></li> + <li><a href="/glossary">Glossary</a></li> + <li><a href="/external-docs">External Docs</a></li> + <li><a href="/docs-archive/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/get_involved">Get Involved</a></li> + <li><a href="/mailing_list">Mailing Lists</a></li> + <li><a href="/people">People</a></li> + <li><a href="/related-projects">Related Projects</a></li> + <li><a href="/contributor/">Contributor Guide</a></li> + </ul> + </li> + </ul> + <ul class="nav navbar-nav navbar-right"> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Apache Software Foundation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/licenses/LICENSE-2.0">License <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> + </ul> + </li> + </ul> + </div> + </div> +</nav> + + <div class="container"> + <div class="row"> + <div class="col-md-12"> + + <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> + Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> + </div> + <div id="content"> + + <div class="row"> + <div class="col-md-3"> + <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true" data-spy="affix"> + <div class="panel panel-default"> + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsegetting-started" aria-expanded="false" aria-controls="collapsegetting-started"> + Getting started + </a> + </h4> + </div> + <div id="collapsegetting-started" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/design">Accumulo Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/quick-install">Quick Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/clients">Accumulo Clients</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/shell">Accumulo Shell</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_design">Table Design</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/getting-started/table_configuration">Table Configuration</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsedevelopment" aria-expanded="true" aria-controls="collapsedevelopment"> + Development + </a> + </h4> + </div> + <div id="collapsedevelopment" class="panel-collapse collapse in" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/iterators">Iterators</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/mapreduce">MapReduce</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/proxy">Proxy</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/development_tools">Development Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/sampling">Sampling</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/summaries">Summary Statistics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/security">Security</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/development/high_speed_ingest">High-Speed Ingest</a></div> + + </div> + </div> + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseadministration" aria-expanded="false" aria-controls="collapseadministration"> + Administration + </a> + </h4> + </div> + <div id="collapseadministration" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/in-depth-install">In-depth Installation</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-management">Configuration Management</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/configuration-properties">Configuration Properties</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/monitoring-metrics">Monitoring & Metrics</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/tracing">Tracing</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/fate">FATE</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/multivolume">Multi-Volume Installations</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/ssl">SSL</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/kerberos">Kerberos</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/administration/replication">Replication</a></div> + + </div> + </div> + + + + + + + + + + + + + + + + + + + + + + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapsetroubleshooting" aria-expanded="false" aria-controls="collapsetroubleshooting"> + Troubleshooting + </a> + </h4> + </div> + <div id="collapsetroubleshooting" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/basic">Basic Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/advanced">Advanced Troubleshooting</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/tools">Troubleshooting Tools</a></div> + + <div class="row doc-sidebar-link"><a href="/docs/unreleased/troubleshooting/system-metadata-tables">System Metadata Tables</a></div> + + </div> + </div> + + + + </div> + </div> + </div> + <div class="col-md-9"> + + <p><a href="/docs/unreleased/">Accumulo unreleased docs</a> >> Development >> Sampling</p> + + + + <div class="alert alert-danger" style="margin-bottom: 0px;" role="alert">This documentation is for a future release of Accumulo! <a href="/1.8/accumulo_user_manual.html">View documentation for the latest release</a>.</div> + + <div class="row"> + <div class="col-md-10"><h1>Sampling</h1></div> + <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/sampling.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + + <h2 id="overview">Overview</h2> + +<p>Accumulo has the ability to generate and scan a per table set of sample data. +This sample data is kept up to date as a table is mutated. What key values are +placed in the sample data is configurable per table.</p> + +<p>This feature can be used for query estimation and optimization. For an example +of estimation assume an Accumulo table is configured to generate a sample +containing one millionth of a tables data. If a query is executed against the +sample and returns one thousand results, then the same query against all the +data would probably return a billion results. A nice property of having +Accumulo generate the sample is that its always up to date. So estimations +will be accurate even when querying the most recently written data.</p> + +<p>An example of a query optimization is an iterator using sample data to get an +estimate, and then making decisions based on the estimate.</p> + +<h2 id="configuring">Configuring</h2> + +<p>In order to use sampling, an Accumulo table must be configured with a class that +implements <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/sample/Sampler.html">Sampler</a> along with options for that class. For guidance on +implementing a Sampler, see the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/sample/Sampler.html">Sampler interface javadoc</a>. Accumulo provides a few +implementations of Sampler out of the box. For information on how to use the samplers that +ship with Accumulo, look in the package <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/sample/package-summary.html">org.apache.accumulo.core.client.sample</a> +and consult the javadoc of the classes there. See the <a href="https://github.com/apache/accumulo-examples/blob/master/docs/sample.md">sampling example</a> +for examples of how to configure a <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/sample/Sampler.html">Sampler</a> on a table.</p> + +<p>Once a table is configured with a <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/sample/Sampler.html">Sampler</a>, all writes after that point will +generate sample data. For data written before sampling was configured, sample +data will not be present. A compaction can be initiated that only compacts the +files in the table that do not have sample data. The <a href="https://github.com/apache/accumulo-examples/blob/master/docs/sample.md">sampling example</a> +shows how to do this.</p> + +<p>If the sampling configuration of a table is changed, then Accumulo will start +generating new sample data with the new configuration. However old data will +still have sample data generated with the previous configuration. A selective +compaction can also be issued in this case to regenerate the sample data.</p> + +<h2 id="scanning-sample-data">Scanning sample data</h2> + +<p>In order to scan sample data, use <code class="highlighter-rouge">setSamplerConfiguration(...)</code> method of +<a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/Scanner.html">Scanner</a> or <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/BatchScanner.html">BatchScanner</a>. Please consult the javadoc of this method for more +information.</p> + +<p>Sample data can also be scanned from within an Accumulo <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/iterators/SortedKeyValueIterator.html">SortedKeyValueIterator</a>. +To see how to do this, look at the example iterator referenced in the <a href="https://github.com/apache/accumulo-examples/blob/master/docs/sample.md">sampling example</a>. +Also, consult the javadoc on <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/iterators/IteratorEnvironment.html#cloneWithSamplingEnabled()">IteratorEnvironment.cloneWithSamplingEnabled()</a>.</p> + +<p>Map reduce jobs using the <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html">AccumuloInputFormat</a> can also read sample data. See +the javadoc for the <code class="highlighter-rouge">setSamplerConfiguration()</code> method of <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html">AccumuloInputFormat</a>.</p> + +<p>Scans over sample data will throw a <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/SampleNotPresentException.html">SampleNotPresentException</a> in the following cases :</p> + +<ol> + <li>sample data is not present,</li> + <li>sample data is present but was generated with multiple configurations</li> + <li>sample data is partially present</li> +</ol> + +<p>So a scan over sample data can only succeed if all data written has sample data +generated with the same configuration.</p> + +<h2 id="bulk-import">Bulk import</h2> + +<p>When generating rfiles to bulk import into Accumulo, those rfiles can contain +sample data. To use this feature, look at the javadoc of the <code class="highlighter-rouge">setSampler(...)</code> +method of <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.8.1/org/apache/accumulo/core/client/mapred/AccumuloFileOutputFormat.html">AccumuloFileOutputFormat</a>.</p> + + + + <div class="row" style="margin-top: 20px;"> + <div class="col-md-10"><strong>Find documentation for all releases in the <a href="/docs-archive">archive</strong></div> + <div class="col-md-2"><a class="pull-right" href="https://github.com/apache/accumulo-website/edit/master/_docs-unreleased/development/sampling.md" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + </div> +</div> + + </div> + + +<footer> + + <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> + + <p>Copyright © 2011-2017 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p> + +</footer> + + + </div> + </div> + </div> +</body> +</html>
