http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.isolation ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.isolation b/docs/src/main/resources/examples/README.isolation deleted file mode 100644 index 4739f59..0000000 --- a/docs/src/main/resources/examples/README.isolation +++ /dev/null @@ -1,50 +0,0 @@ -Title: Apache Accumulo Isolation Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - - -Accumulo has an isolated scanner that ensures partial changes to rows are not -seen. Isolation is documented in ../docs/isolation.html and the user manual. - -InterferenceTest is a simple example that shows the effects of scanning with -and without isolation. This program starts two threads. One threads -continually upates all of the values in a row to be the same thing, but -different from what it used to be. The other thread continually scans the -table and checks that all values in a row are the same. Without isolation the -scanning thread will sometimes see different values, which is the result of -reading the row at the same time a mutation is changing the row. - -Below, Interference Test is run without isolation enabled for 5000 iterations -and it reports problems. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z zookeepers -u username -p password -t isotest --iterations 5000 - ERROR Columns in row 053 had multiple values [53, 4553] - ERROR Columns in row 061 had multiple values [561, 61] - ERROR Columns in row 070 had multiple values [570, 1070] - ERROR Columns in row 079 had multiple values [1079, 1579] - ERROR Columns in row 088 had multiple values [2588, 1588] - ERROR Columns in row 106 had multiple values [2606, 3106] - ERROR Columns in row 115 had multiple values [4615, 3115] - finished - -Below, Interference Test is run with isolation enabled for 5000 iterations and -it reports no problems. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z zookeepers -u username -p password -t isotest --iterations 5000 --isolated - finished - -
http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.mapred ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.mapred b/docs/src/main/resources/examples/README.mapred deleted file mode 100644 index ddd0dbf..0000000 --- a/docs/src/main/resources/examples/README.mapred +++ /dev/null @@ -1,154 +0,0 @@ -Title: Apache Accumulo MapReduce Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example uses mapreduce and accumulo to compute word counts for a set of -documents. This is accomplished using a map-only mapreduce job and a -accumulo table with combiners. - -To run this example you will need a directory in HDFS containing text files. -The accumulo readme will be used to show how to run this example. - - $ hadoop fs -copyFromLocal /path/to/accumulo/README.md /user/username/wc/Accumulo.README - $ hadoop fs -ls /user/username/wc - Found 1 items - -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README - -The first part of running this example is to create a table with a combiner -for the column family count. - - $ ./bin/accumulo shell -u username -p password - Shell - Apache Accumulo Interactive Shell - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> createtable wordCount - username@instance wordCount> setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 10 -t wordCount -majc -minc -scan - SummingCombiner interprets Values as Longs and adds them together. A variety of encodings (variable length, fixed length, or string) are available - ----------> set SummingCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: false - ----------> set SummingCombiner parameter columns, <col fam>[:<col qual>]{,<col fam>[:<col qual>]} escape non-alphanum chars using %<hex>.: count - ----------> set SummingCombiner parameter lossy, if true, failed decodes are ignored. Otherwise combiner will error on failed decodes (default false): <TRUE|FALSE>: false - ----------> set SummingCombiner parameter type, <VARLEN|FIXEDLEN|STRING|fullClassName>: STRING - username@instance wordCount> quit - -After creating the table, run the word count map reduce job. - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z zookeepers --input /user/username/wc -t wordCount -u username -p password - - 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1 - 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003 - 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% - 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% - 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003 - 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 - 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters - 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 - 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 - 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters - 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 - 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework - 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 - 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 - 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 - -After the map reduce job completes, query the accumulo table to see word -counts. - - $ ./bin/accumulo shell -u username -p password - username@instance> table wordCount - username@instance wordCount> scan -b the - the count:20080906 [] 75 - their count:20080906 [] 2 - them count:20080906 [] 1 - then count:20080906 [] 1 - there count:20080906 [] 1 - these count:20080906 [] 3 - this count:20080906 [] 6 - through count:20080906 [] 1 - time count:20080906 [] 3 - time. count:20080906 [] 1 - to count:20080906 [] 27 - total count:20080906 [] 1 - tserver, count:20080906 [] 1 - tserver.compaction.major.concurrent.max count:20080906 [] 1 - ... - -Another example to look at is -org.apache.accumulo.examples.simple.mapreduce.UniqueColumns. This example -computes the unique set of columns in a table and shows how a map reduce job -can directly read a tables files from HDFS. - -One more example available is -org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount. -The TokenFileWordCount example works exactly the same as the WordCount example -explained above except that it uses a token file rather than giving the -password directly to the map-reduce job (this avoids having the password -displayed in the job's configuration which is world-readable). - -To create a token file, use the create-token utility - - $ ./bin/accumulo create-token - -It defaults to creating a PasswordToken, but you can specify the token class -with -tc (requires the fully qualified class name). Based on the token class, -it will prompt you for each property required to create the token. - -The last value it prompts for is a local filename to save to. If this file -exists, it will append the new token to the end. Multiple tokens can exist in -a file, but only the first one for each user will be recognized. - -Rather than waiting for the prompts, you can specify some options when calling -create-token, for example - - $ ./bin/accumulo create-token -u root -p secret -f root.pw - -would create a token file containing a PasswordToken for -user 'root' with password 'secret' and saved to 'root.pw' - -This local file needs to be uploaded to hdfs to be used with the -map-reduce job. For example, if the file were 'root.pw' in the local directory: - - $ hadoop fs -put root.pw root.pw - -This would put 'root.pw' in the user's home directory in hdfs. - -Because the basic WordCount example uses Opts to parse its arguments -(which extends ClientOnRequiredTable), you can use a token file with -the basic WordCount example by calling the same command as explained above -except replacing the password with the token file (rather than -p, use -tf). - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z zookeepers --input /user/username/wc -t wordCount -u username -tf tokenfile - -In the above examples, username was 'root' and tokenfile was 'root.pw' - -However, if you don't want to use the Opts class to parse arguments, -the TokenFileWordCount is an example of using the token file manually. - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount instance zookeepers username tokenfile /user/username/wc wordCount - -The results should be the same as the WordCount example except that the -authentication token was not stored in the configuration. It was instead -stored in a file that the map-reduce job pulled into the distributed cache. -(If you ran either of these on the same table right after the -WordCount example, then the resulting counts should just double.) - - - - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.maxmutation ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.maxmutation b/docs/src/main/resources/examples/README.maxmutation deleted file mode 100644 index 45b80d4..0000000 --- a/docs/src/main/resources/examples/README.maxmutation +++ /dev/null @@ -1,49 +0,0 @@ -Title: Apache Accumulo MaxMutation Constraints Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This an example of how to limit the size of mutations that will be accepted into -a table. Under the default configuration, accumulo does not provide a limitation -on the size of mutations that can be ingested. Poorly behaved writers might -inadvertently create mutations so large, that they cause the tablet servers to -run out of memory. A simple contraint can be added to a table to reject very -large mutations. - - $ ./bin/accumulo shell -u username -p password - - Shell - Apache Accumulo Interactive Shell - - - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> createtable test_ingest - username@instance test_ingest> config -t test_ingest -s table.constraint.1=org.apache.accumulo.examples.simple.constraints.MaxMutationSize - username@instance test_ingest> - - -Now the table will reject any mutation that is larger than 1/256th of the -working memory of the tablet server. The following command attempts to ingest -a single row with 10000 columns, which exceeds the memory limit. Depending on the -amount of Java heap your tserver(s) are given, you may have to increase the number -of columns provided to see the failure. - - $ ./bin/accumulo org.apache.accumulo.test.TestIngest -i instance -z zookeepers -u username -p password --rows 1 --cols 10000 - ERROR : Constraint violates : ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.MaxMutationSize, violationCode:0, violationDescription:mutation exceeded maximum size of 188160, numberOfViolatingMutations:1) - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.regex ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.regex b/docs/src/main/resources/examples/README.regex deleted file mode 100644 index 05ea4de..0000000 --- a/docs/src/main/resources/examples/README.regex +++ /dev/null @@ -1,57 +0,0 @@ -Title: Apache Accumulo Regex Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example uses mapreduce and accumulo to find items using regular expressions. -This is accomplished using a map-only mapreduce job and a scan-time iterator. - -To run this example you will need some data in a table. The following will -put a trivial amount of data into accumulo using the accumulo shell: - - $ ./bin/accumulo shell -u username -p password - Shell - Apache Accumulo Interactive Shell - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> createtable input - username@instance> insert dogrow dogcf dogcq dogvalue - username@instance> insert catrow catcf catcq catvalue - username@instance> quit - -The RegexExample class sets an iterator on the scanner. This does pattern matching -against each key/value in accumulo, and only returns matching items. It will do this -in parallel and will store the results in files in hdfs. - -The following will search for any rows in the input table that starts with "dog": - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.RegexExample -u user -p passwd -i instance -t input --rowRegex 'dog.*' --output /tmp/output - - $ hadoop fs -ls /tmp/output - Found 3 items - -rw-r--r-- 1 username supergroup 0 2013-01-10 14:11 /tmp/output/_SUCCESS - drwxr-xr-x - username supergroup 0 2013-01-10 14:10 /tmp/output/_logs - -rw-r--r-- 1 username supergroup 51 2013-01-10 14:10 /tmp/output/part-m-00000 - -We can see the output of our little map-reduce job: - - $ hadoop fs -text /tmp/output/part-m-00000 - dogrow dogcf:dogcq [] 1357844987994 false dogvalue - - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.reservations ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.reservations b/docs/src/main/resources/examples/README.reservations deleted file mode 100644 index ff111b4..0000000 --- a/docs/src/main/resources/examples/README.reservations +++ /dev/null @@ -1,66 +0,0 @@ -Title: Apache Accumulo Isolation Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example shows running a simple reservation system implemented using -conditional mutations. This system guarantees that only one concurrent user can -reserve a resource. The example's reserve command allows multiple users to be -specified. When this is done, it creates a separate reservation thread for each -user. In the example below threads are spun up for alice, bob, eve, mallory, -and trent to reserve room06 on 20140101. Bob ends up getting the reservation -and everyone else is put on a wait list. The example code will take any string -for what, when and who. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.reservations.ARS - >connect test16 localhost root secret ars - connected - > - Commands : - reserve <what> <when> <who> {who} - cancel <what> <when> <who> - list <what> <when> - >reserve room06 20140101 alice bob eve mallory trent - bob : RESERVED - mallory : WAIT_LISTED - alice : WAIT_LISTED - trent : WAIT_LISTED - eve : WAIT_LISTED - >list room06 20140101 - Reservation holder : bob - Wait list : [mallory, alice, trent, eve] - >cancel room06 20140101 alice - >cancel room06 20140101 bob - >list room06 20140101 - Reservation holder : mallory - Wait list : [trent, eve] - >quit - -Scanning the table in the Accumulo shell after running the example shows the -following: - - root@test16> table ars - root@test16 ars> scan - room06:20140101 res:0001 [] mallory - room06:20140101 res:0003 [] trent - room06:20140101 res:0004 [] eve - room06:20140101 tx:seq [] 6 - -The tx:seq column is incremented for each update to the row allowing for -detection of concurrent changes. For an update to go through, the sequence -number must not have changed since the data was read. If it does change, -the conditional mutation will fail and the example code will retry. - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.rgbalancer ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.rgbalancer b/docs/src/main/resources/examples/README.rgbalancer deleted file mode 100644 index f192a93..0000000 --- a/docs/src/main/resources/examples/README.rgbalancer +++ /dev/null @@ -1,159 +0,0 @@ -Title: Apache Accumulo Hello World Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -For some data access patterns, its important to spread groups of tablets within -a table out evenly. Accumulo has a balancer that can do this using a regular -expression to group tablets. This example shows how this balancer spreads 4 -groups of tablets within a table evenly across 17 tablet servers. - -Below shows creating a table and adding splits. For this example we would like -all of the tablets where the split point has the same two digits to be on -different tservers. This gives us four groups of tablets: 01, 02, 03, and 04. - - root@accumulo> createtable testRGB - root@accumulo testRGB> addsplits -t testRGB 01b 01m 01r 01z 02b 02m 02r 02z 03b 03m 03r 03z 04a 04b 04c 04d 04e 04f 04g 04h 04i 04j 04k 04l 04m 04n 04o 04p - root@accumulo testRGB> tables -l - accumulo.metadata => !0 - accumulo.replication => +rep - accumulo.root => +r - testRGB => 2 - trace => 1 - -After adding the splits we look at the locations in the metadata table. - - root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc - 2;01b loc:34a5f6e086b000c [] ip-10-1-2-25:9997 - 2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997 - 2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997 - 2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997 - 2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997 - 2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997 - 2;02r loc:14a5f6e079d0012 [] ip-10-1-2-27:9997 - 2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997 - 2;03b loc:14a5f6e079d000d [] ip-10-1-2-21:9997 - 2;03m loc:14a5f6e079d000e [] ip-10-1-2-20:9997 - 2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997 - 2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997 - 2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997 - 2;04b loc:14a5f6e079d0010 [] ip-10-1-2-17:9997 - 2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997 - 2;04d loc:24a5f6e07d3000c [] ip-10-1-2-16:9997 - 2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997 - 2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997 - 2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997 - 2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997 - 2;04i loc:34a5f6e086b000d [] ip-10-1-2-19:9997 - 2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997 - 2;04k loc:24a5f6e07d30009 [] ip-10-1-2-23:9997 - 2;04l loc:24a5f6e07d3000b [] ip-10-1-2-22:9997 - 2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997 - 2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997 - 2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997 - 2;04p loc:24a5f6e07d30008 [] ip-10-1-2-24:9997 - 2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997 - -Below the information above was massaged to show which tablet groups are on -each tserver. The four tablets in group 03 are on two tservers, ideally those -tablets would be spread across 4 tservers. Note the default tablet (2<) was -categorized as group 04 below. - - ip-10-1-2-13:9997 01 - ip-10-1-2-14:9997 04 - ip-10-1-2-15:9997 01 - ip-10-1-2-16:9997 04 04 - ip-10-1-2-17:9997 04 04 - ip-10-1-2-18:9997 04 - ip-10-1-2-19:9997 04 04 - ip-10-1-2-20:9997 03 03 - ip-10-1-2-21:9997 03 03 - ip-10-1-2-22:9997 04 04 - ip-10-1-2-23:9997 04 04 - ip-10-1-2-24:9997 04 04 - ip-10-1-2-25:9997 01 01 - ip-10-1-2-26:9997 02 04 - ip-10-1-2-27:9997 02 02 - ip-10-1-2-28:9997 02 04 - ip-10-1-2-29:9997 04 - -To remedy this situation, the RegexGroupBalancer is configured with the -commands below. The configured regular expression selects the first two digits -from a tablets end row as the group id. Tablets that don't match and the -default tablet are configured to be in group 04. - - root@accumulo testRGB> config -t testRGB -s table.custom.balancer.group.regex.pattern=(\\d\\d).* - root@accumulo testRGB> config -t testRGB -s table.custom.balancer.group.regex.default=04 - root@accumulo testRGB> config -t testRGB -s table.balancer=org.apache.accumulo.server.master.balancer.RegexGroupBalancer - -After waiting a little bit, look at the tablet locations again and all is good. - - root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc - 2;01b loc:34a5f6e086b000a [] ip-10-1-2-18:9997 - 2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997 - 2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997 - 2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997 - 2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997 - 2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997 - 2;02r loc:34a5f6e086b000d [] ip-10-1-2-19:9997 - 2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997 - 2;03b loc:24a5f6e07d3000d [] ip-10-1-2-29:9997 - 2;03m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997 - 2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997 - 2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997 - 2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997 - 2;04b loc:34a5f6e086b000c [] ip-10-1-2-25:9997 - 2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997 - 2;04d loc:14a5f6e079d000e [] ip-10-1-2-20:9997 - 2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997 - 2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997 - 2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997 - 2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997 - 2;04i loc:14a5f6e079d0011 [] ip-10-1-2-15:9997 - 2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997 - 2;04k loc:14a5f6e079d0012 [] ip-10-1-2-27:9997 - 2;04l loc:14a5f6e079d000f [] ip-10-1-2-13:9997 - 2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997 - 2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997 - 2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997 - 2;04p loc:14a5f6e079d000d [] ip-10-1-2-21:9997 - 2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997 - -Once again, the data above is transformed to make it easier to see which groups -are on tservers. The transformed data below shows that all groups are now -evenly spread. - - ip-10-1-2-13:9997 01 04 - ip-10-1-2-14:9997 04 - ip-10-1-2-15:9997 01 04 - ip-10-1-2-16:9997 04 - ip-10-1-2-17:9997 04 - ip-10-1-2-18:9997 01 04 - ip-10-1-2-19:9997 02 04 - ip-10-1-2-20:9997 03 04 - ip-10-1-2-21:9997 03 04 - ip-10-1-2-22:9997 04 - ip-10-1-2-23:9997 03 04 - ip-10-1-2-24:9997 04 - ip-10-1-2-25:9997 01 04 - ip-10-1-2-26:9997 02 04 - ip-10-1-2-27:9997 02 04 - ip-10-1-2-28:9997 02 04 - ip-10-1-2-29:9997 03 04 - -If you need this functionality, but a regular expression does not meet your -needs then extend GroupBalancer. This allows you to specify a partitioning -function in Java. Use the RegexGroupBalancer source as an example. http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.rowhash ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.rowhash b/docs/src/main/resources/examples/README.rowhash deleted file mode 100644 index 897a92c..0000000 --- a/docs/src/main/resources/examples/README.rowhash +++ /dev/null @@ -1,59 +0,0 @@ -Title: Apache Accumulo RowHash Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example shows a simple map/reduce job that reads from an accumulo table and -writes back into that table. - -To run this example you will need some data in a table. The following will -put a trivial amount of data into accumulo using the accumulo shell: - - $ ./bin/accumulo shell -u username -p password - Shell - Apache Accumulo Interactive Shell - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> createtable input - username@instance> insert a-row cf cq value - username@instance> insert b-row cf cq value - username@instance> quit - -The RowHash class will insert a hash for each row in the database if it contains a -specified colum. Here's how you run the map/reduce job - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.RowHash -u user -p passwd -i instance -t input --column cf:cq - -Now we can scan the table and see the hashes: - - $ ./bin/accumulo shell -u username -p password - Shell - Apache Accumulo Interactive Shell - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> scan -t input - a-row cf:cq [] value - a-row cf-HASHTYPE:cq-MD5BASE64 [] IGPBYI1uC6+AJJxC4r5YBA== - b-row cf:cq [] value - b-row cf-HASHTYPE:cq-MD5BASE64 [] IGPBYI1uC6+AJJxC4r5YBA== - username@instance> - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.sample ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.sample b/docs/src/main/resources/examples/README.sample deleted file mode 100644 index 3642cc6..0000000 --- a/docs/src/main/resources/examples/README.sample +++ /dev/null @@ -1,192 +0,0 @@ -Title: Apache Accumulo Batch Writing and Scanning Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - - -Basic Sampling Example ----------------------- - -Accumulo supports building a set of sample data that can be efficiently -accessed by scanners. What data is included in the sample set is configurable. -Below, some data representing documents are inserted. - - root@instance sampex> createtable sampex - root@instance sampex> insert 9255 doc content 'abcde' - root@instance sampex> insert 9255 doc url file://foo.txt - root@instance sampex> insert 8934 doc content 'accumulo scales' - root@instance sampex> insert 8934 doc url file://accumulo_notes.txt - root@instance sampex> insert 2317 doc content 'milk, eggs, bread, parmigiano-reggiano' - root@instance sampex> insert 2317 doc url file://groceries/9.txt - root@instance sampex> insert 3900 doc content 'EC2 ate my homework' - root@instance sampex> insert 3900 doc uril file://final_project.txt - -Below the table sampex is configured to build a sample set. The configuration -causes Accumulo to include any row where `murmur3_32(row) % 3 ==0` in the -tables sample data. - - root@instance sampex> config -t sampex -s table.sampler.opt.hasher=murmur3_32 - root@instance sampex> config -t sampex -s table.sampler.opt.modulus=3 - root@instance sampex> config -t sampex -s table.sampler=org.apache.accumulo.core.client.sample.RowSampler - -Below, attempting to scan the sample returns an error. This is because data -was inserted before the sample set was configured. - - root@instance sampex> scan --sample - 2015-09-09 12:21:50,643 [shell.Shell] ERROR: org.apache.accumulo.core.client.SampleNotPresentException: Table sampex(ID:2) does not have sampling configured or built - -To remedy this problem, the following command will flush in memory data and -compact any files that do not contain the correct sample data. - - root@instance sampex> compact -t sampex --sf-no-sample - -After the compaction, the sample scan works. - - root@instance sampex> scan --sample - 2317 doc:content [] milk, eggs, bread, parmigiano-reggiano - 2317 doc:url [] file://groceries/9.txt - -The commands below show that updates to data in the sample are seen when -scanning the sample. - - root@instance sampex> insert 2317 doc content 'milk, eggs, bread, parmigiano-reggiano, butter' - root@instance sampex> scan --sample - 2317 doc:content [] milk, eggs, bread, parmigiano-reggiano, butter - 2317 doc:url [] file://groceries/9.txt - -Inorder to make scanning the sample fast, sample data is partitioned as data is -written to Accumulo. This means if the sample configuration is changed, that -data written previously is partitioned using a different criteria. Accumulo -will detect this situation and fail sample scans. The commands below show this -failure and fixiing the problem with a compaction. - - root@instance sampex> config -t sampex -s table.sampler.opt.modulus=2 - root@instance sampex> scan --sample - 2015-09-09 12:22:51,058 [shell.Shell] ERROR: org.apache.accumulo.core.client.SampleNotPresentException: Table sampex(ID:2) does not have sampling configured or built - root@instance sampex> compact -t sampex --sf-no-sample - 2015-09-09 12:23:07,242 [shell.Shell] INFO : Compaction of table sampex started for given range - root@instance sampex> scan --sample - 2317 doc:content [] milk, eggs, bread, parmigiano-reggiano - 2317 doc:url [] file://groceries/9.txt - 3900 doc:content [] EC2 ate my homework - 3900 doc:uril [] file://final_project.txt - 9255 doc:content [] abcde - 9255 doc:url [] file://foo.txt - -The example above is replicated in a java program using the Accumulo API. -Below is the program name and the command to run it. - - ./bin/accumulo org.apache.accumulo.examples.simple.sample.SampleExample -i instance -z localhost -u root -p secret - -The commands below look under the hood to give some insight into how this -feature works. The commands determine what files the sampex table is using. - - root@instance sampex> tables -l - accumulo.metadata => !0 - accumulo.replication => +rep - accumulo.root => +r - sampex => 2 - trace => 1 - root@instance sampex> scan -t accumulo.metadata -c file -b 2 -e 2< - 2< file:hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf [] 702,8 - -Below shows running `accumulo rfile-info` on the file above. This shows the -rfile has a normal default locality group and a sample default locality group. -The output also shows the configuration used to create the sample locality -group. The sample configuration within a rfile must match the tables sample -configuration for sample scan to work. - - $ ./bin/accumulo rfile-info hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf - Reading file: hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf - RFile Version : 8 - - Locality group : <DEFAULT> - Start block : 0 - Num blocks : 1 - Index level 0 : 35 bytes 1 blocks - First key : 2317 doc:content [] 1437672014986 false - Last key : 9255 doc:url [] 1437672014875 false - Num entries : 8 - Column families : [doc] - - Sample Configuration : - Sampler class : org.apache.accumulo.core.client.sample.RowSampler - Sampler options : {hasher=murmur3_32, modulus=2} - - Sample Locality group : <DEFAULT> - Start block : 0 - Num blocks : 1 - Index level 0 : 36 bytes 1 blocks - First key : 2317 doc:content [] 1437672014986 false - Last key : 9255 doc:url [] 1437672014875 false - Num entries : 6 - Column families : [doc] - - Meta block : BCFile.index - Raw size : 4 bytes - Compressed size : 12 bytes - Compression type : gz - - Meta block : RFile.index - Raw size : 309 bytes - Compressed size : 176 bytes - Compression type : gz - - -Shard Sampling Example -------------------------- - -`README.shard` shows how to index and search files using Accumulo. That -example indexes documents into a table named `shard`. The indexing scheme used -in that example places the document name in the column qualifier. A useful -sample of this indexing scheme should contain all data for any document in the -sample. To accomplish this, the following commands build a sample for the -shard table based on the column qualifier. - - root@instance shard> config -t shard -s table.sampler.opt.hasher=murmur3_32 - root@instance shard> config -t shard -s table.sampler.opt.modulus=101 - root@instance shard> config -t shard -s table.sampler.opt.qualifier=true - root@instance shard> config -t shard -s table.sampler=org.apache.accumulo.core.client.sample.RowColumnSampler - root@instance shard> compact -t shard --sf-no-sample -w - 2015-07-23 15:00:09,280 [shell.Shell] INFO : Compacting table ... - 2015-07-23 15:00:10,134 [shell.Shell] INFO : Compaction of table shard completed for given range - -After enabling sampling, the command below counts the number of documents in -the sample containing the words `import` and `int`. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query --sample -i instance16 -z localhost -t shard -u root -p secret import int | fgrep '.java' | wc - 11 11 1246 - -The command below counts the total number of documents containing the words -`import` and `int`. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i instance16 -z localhost -t shard -u root -p secret import int | fgrep '.java' | wc - 1085 1085 118175 - -The counts 11 out of 1085 total are around what would be expected for a modulus -of 101. Querying the sample first provides a quick way to estimate how much data -the real query will bring back. - -Another way sample data could be used with the shard example is with a -specialized iterator. In the examples source code there is an iterator named -CutoffIntersectingIterator. This iterator first checks how many documents are -found in the sample data. If too many documents are found in the sample data, -then it returns nothing. Otherwise it proceeds to query the full data set. -To experiment with this iterator, use the following command. The -`--sampleCutoff` option below will cause the query to return nothing if based -on the sample it appears a query would return more than 1000 documents. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query --sampleCutoff 1000 -i instance16 -z localhost -t shard -u root -p secret import int | fgrep '.java' | wc http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.shard ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.shard b/docs/src/main/resources/examples/README.shard deleted file mode 100644 index b656927..0000000 --- a/docs/src/main/resources/examples/README.shard +++ /dev/null @@ -1,66 +0,0 @@ -Title: Apache Accumulo Shard Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by -document, or "sharded". This example shows how to use the intersecting iterator through these four programs: - - * Index.java - Indexes a set of text files into an Accumulo table - * Query.java - Finds documents containing a given set of terms. - * Reverse.java - Reads the index table and writes a map of documents to terms into another table. - * ContinuousQuery.java Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms. - -To run these example programs, create two tables like below. - - username@instance> createtable shard - username@instance shard> createtable doc2term - -After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code. - - $ cd /local/username/workspace/accumulo/ - $ find core/src server/src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.simple.shard.Index -i instance -z zookeepers -t shard -u username -p password --partitions 30 - -The following command queries the index to find all files containing 'foo' and 'bar'. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i instance -z zookeepers -t shard -u username -p password foo bar - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java - /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/RowDeleteTest.java - /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java - /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/DeleteEverythingTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java - /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java - /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java - /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java - -In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Reverse -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password - -Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually -randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds. - - $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.ContinuousQuery -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password --terms 5 - [public, core, class, binarycomparable, b] 2 0.081 - [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1 0.041 - [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1 0.049 - [getpackage, testversion, util, version, 55] 1 0.048 - [for, static, println, public, the] 55 0.211 - [sleeptime, wrappingiterator, options, long, utilwaitthread] 1 0.057 - [string, public, long, 0, wait] 12 0.132 http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.tabletofile ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.tabletofile b/docs/src/main/resources/examples/README.tabletofile deleted file mode 100644 index c07c60b..0000000 --- a/docs/src/main/resources/examples/README.tabletofile +++ /dev/null @@ -1,59 +0,0 @@ -Title: Apache Accumulo Table-to-File Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example uses mapreduce to extract specified columns from an existing table. - -To run this example you will need some data in a table. The following will -put a trivial amount of data into accumulo using the accumulo shell: - - $ ./bin/accumulo shell -u username -p password - Shell - Apache Accumulo Interactive Shell - - version: 1.5.0 - - instance name: instance - - instance id: 00000000-0000-0000-0000-000000000000 - - - - type 'help' for a list of available commands - - - username@instance> createtable input - username@instance> insert dog cf cq dogvalue - username@instance> insert cat cf cq catvalue - username@instance> insert junk family qualifier junkvalue - username@instance> quit - -The TableToFile class configures a map-only job to read the specified columns and -write the key/value pairs to a file in HDFS. - -The following will extract the rows containing the column "cf:cq": - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TableToFile -u user -p passwd -i instance -t input --columns cf:cq --output /tmp/output - - $ hadoop fs -ls /tmp/output - -rw-r--r-- 1 username supergroup 0 2013-01-10 14:44 /tmp/output/_SUCCESS - drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs - drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs/history - -rw-r--r-- 1 username supergroup 9049 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_1357847072863_username_TableToFile%5F1357847071434 - -rw-r--r-- 1 username supergroup 26172 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_conf.xml - -rw-r--r-- 1 username supergroup 50 2013-01-10 14:44 /tmp/output/part-m-00000 - -We can see the output of our little map-reduce job: - - $ hadoop fs -text /tmp/output/output/part-m-00000 - catrow cf:cq [] catvalue - dogrow cf:cq [] dogvalue - $ - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.terasort ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.terasort b/docs/src/main/resources/examples/README.terasort deleted file mode 100644 index 5401b91..0000000 --- a/docs/src/main/resources/examples/README.terasort +++ /dev/null @@ -1,50 +0,0 @@ -Title: Apache Accumulo Terasort Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -This example uses map/reduce to generate random input data that will -be sorted by storing it into accumulo. It uses data very similar to the -hadoop terasort benchmark. - -To run this example you run it with arguments describing the amount of data: - - $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest \ - -i instance -z zookeepers -u user -p password \ - --count 10 \ - --minKeySize 10 \ - --maxKeySize 10 \ - --minValueSize 78 \ - --maxValueSize 78 \ - --table sort \ - --splits 10 \ - -After the map reduce job completes, scan the data: - - $ ./bin/accumulo shell -u username -p password - username@instance> scan -t sort - +l-$$OE/ZH c: 4 [] GGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOO - ,C)wDw//u= c: 10 [] CCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKK - 75@~?'WdUF c: 1 [] IIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQ - ;L+!2rT~hd c: 8 [] MMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUU - LsS8)|.ZLD c: 5 [] OOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWW - M^*dDE;6^< c: 9 [] UUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCC - ^Eu)<n#kdP c: 3 [] YYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGG - le5awB.$sm c: 6 [] WWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEE - q__[fwhKFg c: 7 [] EEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMM - w[o||:N&H, c: 2 [] QQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYY - -Of course, a real benchmark would ingest millions of entries. http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/README.visibility ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/README.visibility b/docs/src/main/resources/examples/README.visibility deleted file mode 100644 index b766dba..0000000 --- a/docs/src/main/resources/examples/README.visibility +++ /dev/null @@ -1,131 +0,0 @@ -Title: Apache Accumulo Visibility, Authorizations, and Permissions Example -Notice: Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - . - http://www.apache.org/licenses/LICENSE-2.0 - . - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -## Creating a new user - - root@instance> createuser username - Enter new password for 'username': ******** - Please confirm new password for 'username': ******** - root@instance> user username - Enter password for user username: ******** - username@instance> createtable vistest - 06 10:48:47,931 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action - username@instance> userpermissions - System permissions: - - Table permissions (accumulo.metadata): Table.READ - username@instance> - -A user does not by default have permission to create a table. - -## Granting permissions to a user - - username@instance> user root - Enter password for user root: ******** - root@instance> grant -s System.CREATE_TABLE -u username - root@instance> user username - Enter password for user username: ******** - username@instance> createtable vistest - username@instance> userpermissions - System permissions: System.CREATE_TABLE - - Table permissions (accumulo.metadata): Table.READ - Table permissions (vistest): Table.READ, Table.WRITE, Table.BULK_IMPORT, Table.ALTER_TABLE, Table.GRANT, Table.DROP_TABLE - username@instance vistest> - -## Inserting data with visibilities - -Visibilities are boolean AND (&) and OR (|) combinations of authorization -tokens. Authorization tokens are arbitrary strings taken from a restricted -ASCII character set. Parentheses are required to specify order of operations -in visibilities. - - username@instance vistest> insert row f1 q1 v1 -l A - username@instance vistest> insert row f2 q2 v2 -l A&B - username@instance vistest> insert row f3 q3 v3 -l apple&carrot|broccoli|spinach - 06 11:19:01,432 [shell.Shell] ERROR: org.apache.accumulo.core.util.BadArgumentException: cannot mix | and & near index 12 - apple&carrot|broccoli|spinach - ^ - username@instance vistest> insert row f3 q3 v3 -l (apple&carrot)|broccoli|spinach - username@instance vistest> - -## Scanning with authorizations - -Authorizations are sets of authorization tokens. Each Accumulo user has -authorizations and each Accumulo scan has authorizations. Scan authorizations -are only allowed to be a subset of the user's authorizations. By default, a -user's authorizations set is empty. - - username@instance vistest> scan - username@instance vistest> scan -s A - 06 11:43:14,951 [shell.Shell] ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.AccumuloSecurityException: Error BAD_AUTHORIZATIONS - The user does not have the specified authorizations assigned - username@instance vistest> - -## Setting authorizations for a user - - username@instance vistest> setauths -s A - 06 11:53:42,056 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action - username@instance vistest> - -A user cannot set authorizations unless the user has the System.ALTER_USER permission. -The root user has this permission. - - username@instance vistest> user root - Enter password for user root: ******** - root@instance vistest> setauths -s A -u username - root@instance vistest> user username - Enter password for user username: ******** - username@instance vistest> scan -s A - row f1:q1 [A] v1 - username@instance vistest> scan - row f1:q1 [A] v1 - username@instance vistest> - -The default authorizations for a scan are the user's entire set of authorizations. - - username@instance vistest> user root - Enter password for user root: ******** - root@instance vistest> setauths -s A,B,broccoli -u username - root@instance vistest> user username - Enter password for user username: ******** - username@instance vistest> scan - row f1:q1 [A] v1 - row f2:q2 [A&B] v2 - row f3:q3 [(apple&carrot)|broccoli|spinach] v3 - username@instance vistest> scan -s B - username@instance vistest> - -If you want, you can limit a user to only be able to insert data which they can read themselves. -It can be set with the following constraint. - - username@instance vistest> user root - Enter password for user root: ****** - root@instance vistest> config -t vistest -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint - root@instance vistest> user username - Enter password for user username: ******** - username@instance vistest> insert row f4 q4 v4 -l spinach - Constraint Failures: - ConstraintViolationSummary(constrainClass:org.apache.accumulo.core.security.VisibilityConstraint, violationCode:2, violationDescription:User does not have authorization on column visibility, numberOfViolatingMutations:1) - username@instance vistest> insert row f4 q4 v4 -l spinach|broccoli - username@instance vistest> scan - row f1:q1 [A] v1 - row f2:q2 [A&B] v2 - row f3:q3 [(apple&carrot)|broccoli|spinach] v3 - row f4:q4 [spinach|broccoli] v4 - username@instance vistest> - http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/batch.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/batch.md b/docs/src/main/resources/examples/batch.md new file mode 100644 index 0000000..d3ff5cf --- /dev/null +++ b/docs/src/main/resources/examples/batch.md @@ -0,0 +1,57 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Batch Writing and Scanning Example +--- + +This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.client in the examples-simple module: + + * SequentialBatchWriter.java - writes mutations with sequential rows and random values + * RandomBatchWriter.java - used by SequentialBatchWriter to generate random values + * RandomBatchScanner.java - reads random rows and verifies their values + +This is an example of how to use the batch writer and batch scanner. To compile +the example, run maven and copy the produced jar into the accumulo lib dir. +This is already done in the tar distribution. + +Below are commands that add 10000 entries to accumulo and then do 100 random +queries. The write command generates random 50 byte values. + +Be sure to use the name of your instance (given as instance here) and the appropriate +list of zookeeper nodes (given as zookeepers here). + +Before you run this, you must ensure that the user you are running has the +"exampleVis" authorization. (you can set this in the shell with "setauths -u username -s exampleVis") + + $ ./bin/accumulo shell -u root -e "setauths -u username -s exampleVis" + +You must also create the table, batchtest1, ahead of time. (In the shell, use "createtable batchtest1") + + $ ./bin/accumulo shell -u username -e "createtable batchtest1" + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance -z zookeepers -u username -p password -t batchtest1 --start 0 --num 10000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 --vis exampleVis + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner -i instance -z zookeepers -u username -p password -t batchtest1 --num 100 --min 0 --max 10000 --size 50 --scanThreads 20 --auths exampleVis + 07 11:33:11,103 [client.CountingVerifyingReceiver] INFO : Generating 100 random queries... + 07 11:33:11,112 [client.CountingVerifyingReceiver] INFO : finished + 07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : 694.44 lookups/sec 0.14 secs + + 07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : num results : 100 + + 07 11:33:11,364 [client.CountingVerifyingReceiver] INFO : Generating 100 random queries... + 07 11:33:11,370 [client.CountingVerifyingReceiver] INFO : finished + 07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : 2173.91 lookups/sec 0.05 secs + + 07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : num results : 100 http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/bloom.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/bloom.md b/docs/src/main/resources/examples/bloom.md new file mode 100644 index 0000000..7aa8e86 --- /dev/null +++ b/docs/src/main/resources/examples/bloom.md @@ -0,0 +1,221 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Bloom Filter Example +--- + +This example shows how to create a table with bloom filters enabled. It also +shows how bloom filters increase query performance when looking for values that +do not exist in a table. + +Below table named bloom_test is created and bloom filters are enabled. + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> setauths -u username -s exampleVis + username@instance> createtable bloom_test + username@instance bloom_test> config -t bloom_test -s table.bloom.enabled=true + username@instance bloom_test> exit + +Below 1 million random values are inserted into accumulo. The randomly +generated rows range between 0 and 1 billion. The random number generator is +initialized with the seed 7. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --vis exampleVis + +Below the table is flushed: + + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test -w' + 05 10:40:06,069 [shell.Shell] INFO : Flush of table bloom_test completed. + +After the flush completes, 500 random queries are done against the table. The +same seed is used to generate the queries, therefore everything is found in the +table. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 --max 1000000000 --size 50 --scanThreads 20 --auths exampleVis + Generating 500 random queries...finished + 96.19 lookups/sec 5.20 secs + num results : 500 + Generating 500 random queries...finished + 102.35 lookups/sec 4.89 secs + num results : 500 + +Below another 500 queries are performed, using a different seed which results +in nothing being found. In this case the lookups are much faster because of +the bloom filters. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 8 -i instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 --max 1000000000 --size 50 -batchThreads 20 -auths exampleVis + Generating 500 random queries...finished + 2212.39 lookups/sec 0.23 secs + num results : 0 + Did not find 500 rows + Generating 500 random queries...finished + 4464.29 lookups/sec 0.11 secs + num results : 0 + Did not find 500 rows + +******************************************************************************** + +Bloom filters can also speed up lookups for entries that exist. In accumulo +data is divided into tablets and each tablet has multiple map files. Every +lookup in accumulo goes to a specific tablet where a lookup is done on each +map file in the tablet. So if a tablet has three map files, lookup performance +can be three times slower than a tablet with one map file. However if the map +files contain unique sets of data, then bloom filters can help eliminate map +files that do not contain the row being looked up. To illustrate this two +identical tables were created using the following process. One table had bloom +filters, the other did not. Also the major compaction ratio was increased to +prevent the files from being compacted into one file. + + * Insert 1 million entries using RandomBatchWriter with a seed of 7 + * Flush the table using the shell + * Insert 1 million entries using RandomBatchWriter with a seed of 8 + * Flush the table using the shell + * Insert 1 million entries using RandomBatchWriter with a seed of 9 + * Flush the table using the shell + +After following the above steps, each table will have a tablet with three map +files. Flushing the table after each batch of inserts will create a map file. +Each map file will contain 1 million entries generated with a different seed. +This is assuming that Accumulo is configured with enough memory to hold 1 +million inserts. If not, then more map files will be created. + +The commands for creating the first table without bloom filters are below. + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> setauths -u username -s exampleVis + username@instance> createtable bloom_test1 + username@instance bloom_test1> config -t bloom_test1 -s table.compaction.major.ratio=7 + username@instance bloom_test1> exit + + $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test1 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --vis exampleVis" + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w' + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w' + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w' + +The commands for creating the second table with bloom filers are below. + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> setauths -u username -s exampleVis + username@instance> createtable bloom_test2 + username@instance bloom_test2> config -t bloom_test2 -s table.compaction.major.ratio=7 + username@instance bloom_test2> config -t bloom_test2 -s table.bloom.enabled=true + username@instance bloom_test2> exit + + $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test2 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --vis exampleVis" + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w' + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w' + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS + $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w' + +Below 500 lookups are done against the table without bloom filters using random +NG seed 7. Even though only one map file will likely contain entries for this +seed, all map files will be interrogated. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test1 --num 500 --min 0 --max 1000000000 --size 50 --scanThreads 20 --auths exampleVis + Generating 500 random queries...finished + 35.09 lookups/sec 14.25 secs + num results : 500 + Generating 500 random queries...finished + 35.33 lookups/sec 14.15 secs + num results : 500 + +Below the same lookups are done against the table with bloom filters. The +lookups were 2.86 times faster because only one map file was used, even though three +map files existed. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test2 --num 500 --min 0 --max 1000000000 --size 50 -scanThreads 20 --auths exampleVis + Generating 500 random queries...finished + 99.03 lookups/sec 5.05 secs + num results : 500 + Generating 500 random queries...finished + 101.15 lookups/sec 4.94 secs + num results : 500 + +You can verify the table has three files by looking in HDFS. To look in HDFS +you will need the table ID, because this is used in HDFS instead of the table +name. The following command will show table ids. + + $ ./bin/accumulo shell -u username -p password -e 'tables -l' + accumulo.metadata => !0 + accumulo.root => +r + bloom_test1 => o7 + bloom_test2 => o8 + trace => 1 + +So the table id for bloom_test2 is o8. The command below shows what files this +table has in HDFS. This assumes Accumulo is at the default location in HDFS. + + $ hadoop fs -lsr /accumulo/tables/o8 + drwxr-xr-x - username supergroup 0 2012-01-10 14:02 /accumulo/tables/o8/default_tablet + -rw-r--r-- 3 username supergroup 52672650 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dj.rf + -rw-r--r-- 3 username supergroup 52436176 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dk.rf + -rw-r--r-- 3 username supergroup 52850173 2012-01-10 14:02 /accumulo/tables/o8/default_tablet/F00000dl.rf + +Running the rfile-info command shows that one of the files has a bloom filter +and its 1.5MB. + + $ ./bin/accumulo rfile-info /accumulo/tables/o8/default_tablet/F00000dj.rf + Locality group : <DEFAULT> + Start block : 0 + Num blocks : 752 + Index level 0 : 43,598 bytes 1 blocks + First key : row_0000001169 foo:1 [exampleVis] 1326222052539 false + Last key : row_0999999421 foo:1 [exampleVis] 1326222052058 false + Num entries : 999,536 + Column families : [foo] + + Meta block : BCFile.index + Raw size : 4 bytes + Compressed size : 12 bytes + Compression type : gz + + Meta block : RFile.index + Raw size : 43,696 bytes + Compressed size : 15,592 bytes + Compression type : gz + + Meta block : acu_bloom + Raw size : 1,540,292 bytes + Compressed size : 1,433,115 bytes + Compression type : gz + http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/bulkIngest.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/bulkIngest.md b/docs/src/main/resources/examples/bulkIngest.md new file mode 100644 index 0000000..468c903 --- /dev/null +++ b/docs/src/main/resources/examples/bulkIngest.md @@ -0,0 +1,35 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Bulk Ingest Example +--- + +This is an example of how to bulk ingest data into accumulo using map reduce. + +The following commands show how to run this example. This example creates a +table called test_bulk which has two initial split points. Then 1000 rows of +test data are created in HDFS. After that the 1000 rows are ingested into +accumulo. Then we verify the 1000 rows are in accumulo. + + $ PKG=org.apache.accumulo.examples.simple.mapreduce.bulk + $ ARGS="-i instance -z zookeepers -u username -p password" + $ ./bin/accumulo $PKG.SetupTable $ARGS -t test_bulk row_00000333 row_00000666 + $ ./bin/accumulo $PKG.GenerateTestData --start-row 0 --count 1000 --output bulk/test_1.txt + $ ./contrib/tool.sh lib/accumulo-examples-simple.jar $PKG.BulkIngestExample $ARGS -t test_bulk --inputDir bulk --workDir tmp/bulkWork + $ ./bin/accumulo $PKG.VerifyIngest $ARGS -t test_bulk --start-row 0 --count 1000 + +For a high level discussion of bulk ingest, see the docs dir. http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/classpath.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/classpath.md b/docs/src/main/resources/examples/classpath.md new file mode 100644 index 0000000..7ed7381 --- /dev/null +++ b/docs/src/main/resources/examples/classpath.md @@ -0,0 +1,69 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Classpath Example +--- + +This example shows how to use per table classpaths. The example leverages a +test jar which contains a Filter that supresses rows containing "foo". The +example shows copying the FooFilter.jar into HDFS and then making an Accumulo +table reference that jar. + + +Execute the following command in the shell. + + $ hadoop fs -copyFromLocal /path/to/accumulo/test/src/test/resources/FooFilter.jar /user1/lib + +Execute following in Accumulo shell to setup classpath context + + root@test15> config -s general.vfs.context.classpath.cx1=hdfs://<namenode host>:<namenode port>/user1/lib/[^.].*.jar + +Create a table + + root@test15> createtable nofoo + +The following command makes this table use the configured classpath context + + root@test15 nofoo> config -t nofoo -s table.classpath.context=cx1 + +The following command configures an iterator thats in FooFilter.jar + + root@test15 nofoo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter + Filter accepts or rejects each Key/Value pair + ----------> set FooFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: false + +The commands below show the filter is working. + + root@test15 nofoo> insert foo1 f1 q1 v1 + root@test15 nofoo> insert noo1 f1 q1 v2 + root@test15 nofoo> scan + noo1 f1:q1 [] v2 + root@test15 nofoo> + +Below, an attempt is made to add the FooFilter to a table thats not configured +to use the clasppath context cx1. This fails util the table is configured to +use cx1. + + root@test15 nofoo> createtable nofootwo + root@test15 nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter + 2013-05-03 12:49:35,943 [shell.Shell] ERROR: java.lang.IllegalArgumentException: org.apache.accumulo.test.FooFilter + root@test15 nofootwo> config -t nofootwo -s table.classpath.context=cx1 + root@test15 nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter + Filter accepts or rejects each Key/Value pair + ----------> set FooFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: false + + http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/client.md ---------------------------------------------------------------------- diff --git a/docs/src/main/resources/examples/client.md b/docs/src/main/resources/examples/client.md new file mode 100644 index 0000000..b07ae8e --- /dev/null +++ b/docs/src/main/resources/examples/client.md @@ -0,0 +1,81 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +--- +title: Apache Accumulo Client Examples +--- + +This documents how you run the simplest java examples. + +This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.client in the examples-simple module: + + * Flush.java - flushes a table + * RowOperations.java - reads and writes rows + * ReadWriteExample.java - creates a table, writes to it, and reads from it + +Using the accumulo command, you can run the simple client examples by providing their +class name, and enough arguments to find your accumulo instance. For example, +the Flush class will flush a table: + + $ PACKAGE=org.apache.accumulo.examples.simple.client + $ bin/accumulo $PACKAGE.Flush -u root -p mypassword -i instance -z zookeeper -t trace + +The very simple RowOperations class demonstrates how to read and write rows using the BatchWriter +and Scanner: + + $ bin/accumulo $PACKAGE.RowOperations -u root -p mypassword -i instance -z zookeeper + 2013-01-14 14:45:24,738 [client.RowOperations] INFO : This is everything + 2013-01-14 14:45:24,744 [client.RowOperations] INFO : Key: row1 column:1 [] 1358192724640 false Value: This is the value for this key + 2013-01-14 14:45:24,744 [client.RowOperations] INFO : Key: row1 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,744 [client.RowOperations] INFO : Key: row1 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,744 [client.RowOperations] INFO : Key: row1 column:4 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,746 [client.RowOperations] INFO : Key: row2 column:1 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,746 [client.RowOperations] INFO : Key: row2 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,746 [client.RowOperations] INFO : Key: row2 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,746 [client.RowOperations] INFO : Key: row2 column:4 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,747 [client.RowOperations] INFO : Key: row3 column:1 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,747 [client.RowOperations] INFO : Key: row3 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,747 [client.RowOperations] INFO : Key: row3 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,747 [client.RowOperations] INFO : Key: row3 column:4 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,756 [client.RowOperations] INFO : This is row1 and row3 + 2013-01-14 14:45:24,757 [client.RowOperations] INFO : Key: row1 column:1 [] 1358192724640 false Value: This is the value for this key + 2013-01-14 14:45:24,757 [client.RowOperations] INFO : Key: row1 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,757 [client.RowOperations] INFO : Key: row1 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,757 [client.RowOperations] INFO : Key: row1 column:4 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,761 [client.RowOperations] INFO : Key: row3 column:1 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,761 [client.RowOperations] INFO : Key: row3 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,761 [client.RowOperations] INFO : Key: row3 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,761 [client.RowOperations] INFO : Key: row3 column:4 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,765 [client.RowOperations] INFO : This is just row3 + 2013-01-14 14:45:24,769 [client.RowOperations] INFO : Key: row3 column:1 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,770 [client.RowOperations] INFO : Key: row3 column:2 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,770 [client.RowOperations] INFO : Key: row3 column:3 [] 1358192724642 false Value: This is the value for this key + 2013-01-14 14:45:24,770 [client.RowOperations] INFO : Key: row3 column:4 [] 1358192724642 false Value: This is the value for this key + +To create a table, write to it and read from it: + + $ bin/accumulo $PACKAGE.ReadWriteExample -u root -p mypassword -i instance -z zookeeper --createtable --create --read + hello%00; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%01; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%02; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%03; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%04; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%05; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%06; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%07; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%08; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world + hello%09; datatypes:xml [LEVEL1|GROUP1] 1358192329450 false -> world +