http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/shard.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/shard.md b/docs/src/main/resources/examples/shard.md new file mode 100644 index 0000000..5e5789b --- /dev/null +++ b/docs/src/main/resources/examples/shard.md @@ -0,0 +1,68 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Shard Example +--- + +Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by +document, or "sharded". This example shows how to use the intersecting iterator through these four programs: + + * Index.java - Indexes a set of text files into an Accumulo table + * Query.java - Finds documents containing a given set of terms. + * Reverse.java - Reads the index table and writes a map of documents to terms into another table. + * ContinuousQuery.java Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms. + +To run these example programs, create two tables like below. + + username@instance> createtable shard + username@instance shard> createtable doc2term + +After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code. + + $ cd /local/username/workspace/accumulo/ + $ find core/src server/src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.simple.shard.Index -i instance -z zookeepers -t shard -u username -p password --partitions 30 + +The following command queries the index to find all files containing 'foo' and 'bar'. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i instance -z zookeepers -t shard -u username -p password foo bar + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java + /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/RowDeleteTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java + /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/DeleteEverythingTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java + +In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Reverse -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password + +Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually +randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.ContinuousQuery -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password --terms 5 + [public, core, class, binarycomparable, b] 2 0.081 + [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1 0.041 + [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1 0.049 + [getpackage, testversion, util, version, 55] 1 0.048 + [for, static, println, public, the] 55 0.211 + [sleeptime, wrappingiterator, options, long, utilwaitthread] 1 0.057 + [string, public, long, 0, wait] 12 0.132
http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/tabletofile.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/tabletofile.md b/docs/src/main/resources/examples/tabletofile.md new file mode 100644 index 0000000..5316b51 --- /dev/null +++ b/docs/src/main/resources/examples/tabletofile.md @@ -0,0 +1,61 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +Title: Apache Accumulo Table-to-File Example +--- + +This example uses mapreduce to extract specified columns from an existing table. + +To run this example you will need some data in a table. The following will +put a trivial amount of data into accumulo using the accumulo shell: + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable input + username@instance> insert dog cf cq dogvalue + username@instance> insert cat cf cq catvalue + username@instance> insert junk family qualifier junkvalue + username@instance> quit + +The TableToFile class configures a map-only job to read the specified columns and +write the key/value pairs to a file in HDFS. + +The following will extract the rows containing the column "cf:cq": + + $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TableToFile -u user -p passwd -i instance -t input --columns cf:cq --output /tmp/output + + $ hadoop fs -ls /tmp/output + -rw-r--r-- 1 username supergroup 0 2013-01-10 14:44 /tmp/output/_SUCCESS + drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs + drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs/history + -rw-r--r-- 1 username supergroup 9049 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_1357847072863_username_TableToFile%5F1357847071434 + -rw-r--r-- 1 username supergroup 26172 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_conf.xml + -rw-r--r-- 1 username supergroup 50 2013-01-10 14:44 /tmp/output/part-m-00000 + +We can see the output of our little map-reduce job: + + $ hadoop fs -text /tmp/output/output/part-m-00000 + catrow cf:cq [] catvalue + dogrow cf:cq [] dogvalue + $ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/terasort.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/terasort.md b/docs/src/main/resources/examples/terasort.md new file mode 100644 index 0000000..195bb4a --- /dev/null +++ b/docs/src/main/resources/examples/terasort.md @@ -0,0 +1,52 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Terasort Example +--- + +This example uses map/reduce to generate random input data that will +be sorted by storing it into accumulo. It uses data very similar to the +hadoop terasort benchmark. + +To run this example you run it with arguments describing the amount of data: + + $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest \ + -i instance -z zookeepers -u user -p password \ + --count 10 \ + --minKeySize 10 \ + --maxKeySize 10 \ + --minValueSize 78 \ + --maxValueSize 78 \ + --table sort \ + --splits 10 \ + +After the map reduce job completes, scan the data: + + $ ./bin/accumulo shell -u username -p password + username@instance> scan -t sort + +l-$$OE/ZH c: 4 [] GGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOO + ,C)wDw//u= c: 10 [] CCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKK + 75@~?'WdUF c: 1 [] IIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQ + ;L+!2rT~hd c: 8 [] MMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUU + LsS8)|.ZLD c: 5 [] OOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWW + M^*dDE;6^< c: 9 [] UUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCC + ^Eu)<n#kdP c: 3 [] YYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGG + le5awB.$sm c: 6 [] WWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEE + q__[fwhKFg c: 7 [] EEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMM + w[o||:N&H, c: 2 [] QQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYY + +Of course, a real benchmark would ingest millions of entries. http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/visibility.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/visibility.md b/docs/src/main/resources/examples/visibility.md new file mode 100644 index 0000000..8345a9b --- /dev/null +++ b/docs/src/main/resources/examples/visibility.md @@ -0,0 +1,133 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Visibility, Authorizations, and Permissions Example +--- + +## Creating a new user + + root@instance> createuser username + Enter new password for 'username': ******** + Please confirm new password for 'username': ******** + root@instance> user username + Enter password for user username: ******** + username@instance> createtable vistest + 06 10:48:47,931 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action + username@instance> userpermissions + System permissions: + + Table permissions (accumulo.metadata): Table.READ + username@instance> + +A user does not by default have permission to create a table. + +## Granting permissions to a user + + username@instance> user root + Enter password for user root: ******** + root@instance> grant -s System.CREATE_TABLE -u username + root@instance> user username + Enter password for user username: ******** + username@instance> createtable vistest + username@instance> userpermissions + System permissions: System.CREATE_TABLE + + Table permissions (accumulo.metadata): Table.READ + Table permissions (vistest): Table.READ, Table.WRITE, Table.BULK_IMPORT, Table.ALTER_TABLE, Table.GRANT, Table.DROP_TABLE + username@instance vistest> + +## Inserting data with visibilities + +Visibilities are boolean AND (&) and OR (|) combinations of authorization +tokens. Authorization tokens are arbitrary strings taken from a restricted +ASCII character set. Parentheses are required to specify order of operations +in visibilities. + + username@instance vistest> insert row f1 q1 v1 -l A + username@instance vistest> insert row f2 q2 v2 -l A&B + username@instance vistest> insert row f3 q3 v3 -l apple&carrot|broccoli|spinach + 06 11:19:01,432 [shell.Shell] ERROR: org.apache.accumulo.core.util.BadArgumentException: cannot mix | and & near index 12 + apple&carrot|broccoli|spinach + ^ + username@instance vistest> insert row f3 q3 v3 -l (apple&carrot)|broccoli|spinach + username@instance vistest> + +## Scanning with authorizations + +Authorizations are sets of authorization tokens. Each Accumulo user has +authorizations and each Accumulo scan has authorizations. Scan authorizations +are only allowed to be a subset of the user's authorizations. By default, a +user's authorizations set is empty. + + username@instance vistest> scan + username@instance vistest> scan -s A + 06 11:43:14,951 [shell.Shell] ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.AccumuloSecurityException: Error BAD_AUTHORIZATIONS - The user does not have the specified authorizations assigned + username@instance vistest> + +## Setting authorizations for a user + + username@instance vistest> setauths -s A + 06 11:53:42,056 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action + username@instance vistest> + +A user cannot set authorizations unless the user has the System.ALTER_USER permission. +The root user has this permission. + + username@instance vistest> user root + Enter password for user root: ******** + root@instance vistest> setauths -s A -u username + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> scan -s A + row f1:q1 [A] v1 + username@instance vistest> scan + row f1:q1 [A] v1 + username@instance vistest> + +The default authorizations for a scan are the user's entire set of authorizations. + + username@instance vistest> user root + Enter password for user root: ******** + root@instance vistest> setauths -s A,B,broccoli -u username + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> scan + row f1:q1 [A] v1 + row f2:q2 [A&B] v2 + row f3:q3 [(apple&carrot)|broccoli|spinach] v3 + username@instance vistest> scan -s B + username@instance vistest> + +If you want, you can limit a user to only be able to insert data which they can read themselves. +It can be set with the following constraint. + + username@instance vistest> user root + Enter password for user root: ****** + root@instance vistest> config -t vistest -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> insert row f4 q4 v4 -l spinach + Constraint Failures: + ConstraintViolationSummary(constrainClass:org.apache.accumulo.core.security.VisibilityConstraint, violationCode:2, violationDescription:User does not have authorization on column visibility, numberOfViolatingMutations:1) + username@instance vistest> insert row f4 q4 v4 -l spinach|broccoli + username@instance vistest> scan + row f1:q1 [A] v1 + row f2:q2 [A&B] v2 + row f3:q3 [(apple&carrot)|broccoli|spinach] v3 + row f4:q4 [spinach|broccoli] v4 + username@instance vistest> +