Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT
Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/1bddc574 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/1bddc574 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/1bddc574 Branch: refs/heads/1.6.0-SNAPSHOT Commit: 1bddc574086129aca3484a6070aee257c8622085 Parents: 0d49819 00fb08b Author: Christopher Tubbs <ctubb...@apache.org> Authored: Thu Dec 5 11:55:58 2013 -0500 Committer: Christopher Tubbs <ctubb...@apache.org> Committed: Thu Dec 5 11:55:58 2013 -0500 ---------------------------------------------------------------------- .../apache/accumulo/examples/simple/filedata/FileDataIngest.java | 2 +- .../apache/accumulo/examples/simple/filedata/FileDataQuery.java | 2 +- server/monitor/src/main/resources/docs/examples/README.filedata | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/1bddc574/examples/simple/src/main/java/org/apache/accumulo/examples/simple/filedata/FileDataQuery.java ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/1bddc574/server/monitor/src/main/resources/docs/examples/README.filedata ---------------------------------------------------------------------- diff --cc server/monitor/src/main/resources/docs/examples/README.filedata index 946ca8c,0000000..9f0016e mode 100644,000000..100644 --- a/server/monitor/src/main/resources/docs/examples/README.filedata +++ b/server/monitor/src/main/resources/docs/examples/README.filedata @@@ -1,47 -1,0 +1,47 @@@ +Title: Apache Accumulo File System Archive Example (Data Only) +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example archives file data into an Accumulo table. Files with duplicate data are only stored once. +The example has the following classes: + + * CharacterHistogram - A MapReduce that computes a histogram of byte frequency for each file and stores the histogram alongside the file data. An example use of the ChunkInputFormat. + * ChunkCombiner - An Iterator that dedupes file data and sets their visibilities to a combined visibility based on current references to the file data. + * ChunkInputFormat - An Accumulo InputFormat that provides keys containing file info (List<Entry<Key,Value>>) and values with an InputStream over the file (ChunkInputStream). + * ChunkInputStream - An input stream over file data stored in Accumulo. - * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on the SHA1 hashes of the files. - * FileDataQuery - Retrieves file data based on the SHA1 hash of the file. (Used by the dirlist.Viewer.) ++ * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on hashes of the files. ++ * FileDataQuery - Retrieves file data based on the hash of the file. (Used by the dirlist.Viewer.) + * KeyUtil - A utility for creating and parsing null-byte separated strings into/from Text objects. + * VisibilityCombiner - A utility for merging visibilities into the form (VIS1)|(VIS2)|... + +This example is coupled with the dirlist example. See README.dirlist for instructions. + +If you haven't already run the README.dirlist example, ingest a file with FileDataIngest. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --chunk 1000 $ACCUMULO_HOME/README + +Open the accumulo shell and look at the data. The row is the MD5 hash of the file, which you can verify by running a command such as 'md5sum' on the file. + + > scan -t dataTable + +Run the CharacterHistogram MapReduce to add some information about the file. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --vis exampleVis + +Scan again to see the histogram stored in the 'info' column family. + + > scan -t dataTable