This is an automated email from the ASF dual-hosted git repository. domgarguilo pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/accumulo-examples.git
The following commit(s) were added to refs/heads/main by this push: new ac2ec84 Fix and improve several examples (#94) ac2ec84 is described below commit ac2ec84b87910fbb656751fb927995396798029c Author: Dom G <domgargu...@apache.org> AuthorDate: Mon Apr 4 09:16:43 2022 -0400 Fix and improve several examples (#94) --- docs/bloom.md | 2 +- docs/classpath.md | 2 +- docs/compactionStrategy.md | 10 +++++----- docs/shard.md | 2 +- docs/tabletofile.md | 6 ++---- docs/terasort.md | 4 ++-- docs/wordcount.md | 6 ++++-- 7 files changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/bloom.md b/docs/bloom.md index 8a38df5..da3a974 100644 --- a/docs/bloom.md +++ b/docs/bloom.md @@ -24,7 +24,7 @@ do not exist in a table. Accumulo data is divided into tablets and each tablet has multiple r-files. Lookup performance of a tablet with 3 r-files can be 3 times slower than -a tablet with one r-file. However if the files contain unique sets of data, +a tablet with one r-file. However, if the files contain unique sets of data, then bloom filters can help with performance. Run the example below to create two identical tables. One table has bloom diff --git a/docs/classpath.md b/docs/classpath.md index efd37bc..e12df09 100644 --- a/docs/classpath.md +++ b/docs/classpath.md @@ -66,7 +66,7 @@ use cx1. root@uno examples.nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter 2013-05-03 12:49:35,943 [shell.Shell] ERROR: org.apache.accumulo.shell.ShellCommandException: Command could not be initialized (Unable to load org.apache.accumulo.test.FooFilter; class not found.) - root@uno examples.nofootwo> config -t nofootwo -s table.class.loader.context=cx1 + root@uno examples.nofootwo> config -t examples.nofootwo -s table.class.loader.context=cx1 root@uno examples.nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter Filter accepts or rejects each Key/Value pair ----------> set FooFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: false diff --git a/docs/compactionStrategy.md b/docs/compactionStrategy.md index 8ae0908..b0be2fa 100644 --- a/docs/compactionStrategy.md +++ b/docs/compactionStrategy.md @@ -45,10 +45,10 @@ The commands below will configure the BasicCompactionStrategy to: ```bash $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.file.compress.type=snappy" - $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.strategies.BasicCompactionStrategy" - $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.filter.size=250M" - $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.large.compress.threshold=100M" - $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.large.compress.type=gz" + $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.strategies.BasicCompactionStrategy" + $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.filter.size=250M" + $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.large.compress.threshold=100M" + $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.large.compress.type=gz" ``` Generate some data and files in order to test the strategy: @@ -64,7 +64,7 @@ $ ./bin/runex client.SequentialBatchWriter -t examples.test1 --start 0 --num 130 $ accumulo shell -u <username> -p <password> -e "flush -t examples.test1" ``` -View the tserver log in <accumulo_home>/logs for the compaction and find the name of the <rfile> that was compacted for your table. Print info about this file using the PrintInfo tool: +View the tserver log in <accumulo_home>/logs for the compaction and find the name of the `rfile` that was compacted for your table. Print info about this file using the PrintInfo tool: ```bash $ accumulo rfile-info <rfile> diff --git a/docs/shard.md b/docs/shard.md index f6f6848..97a9d40 100644 --- a/docs/shard.md +++ b/docs/shard.md @@ -43,7 +43,7 @@ The following command queries the index to find all files containing 'foo' and ' /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java -In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term. +In order to run ContinuousQuery, we need to run Reverse.java to populate the `examples.doc2term` table. $ ./bin/runex shard.Reverse --shardTable examples.shard --doc2Term examples.doc2term diff --git a/docs/tabletofile.md b/docs/tabletofile.md index c72d5b8..5968e29 100644 --- a/docs/tabletofile.md +++ b/docs/tabletofile.md @@ -30,7 +30,7 @@ put a trivial amount of data into accumulo using the accumulo shell: root@instance examples.input> quit The TableToFile class configures a map-only job to read the specified columns and -write the key/value pairs to a file in HDFS. +writes the key/value pairs to a file in HDFS. The following will extract the rows containing the column "cf:cq": @@ -45,6 +45,4 @@ We can see the output of our little map-reduce job: $ hadoop fs -text /tmp/output/part-m-00000 catrow cf:cq [] catvalue - dogrow cf:cq [] dogvalue - $ - + dogrow cf:cq [] dogvalue \ No newline at end of file diff --git a/docs/terasort.md b/docs/terasort.md index 16f2ea1..5539883 100644 --- a/docs/terasort.md +++ b/docs/terasort.md @@ -25,10 +25,10 @@ ignored. $ accumulo shell -u root -p secret -e 'createnamespace examples' -To run this example you run it with arguments describing the amount of data: +This example is run with arguments describing the amount of data: $ ./bin/runmr mapreduce.TeraSortIngest --count 10 --minKeySize 10 --maxKeySize 10 \ - --minValueSize 78 --maxValueSize 78 --table examples.sort --splits 10 \ + --minValueSize 78 --maxValueSize 78 --table examples.sort --splits 10 After the map reduce job completes, scan the data: diff --git a/docs/wordcount.md b/docs/wordcount.md index 4c5a27f..fca4af0 100644 --- a/docs/wordcount.md +++ b/docs/wordcount.md @@ -55,10 +55,12 @@ information like passwords. A more secure option is store accumulo-client.proper in HDFS and run the job with the `-D` options. This will configure the MapReduce job to obtain the client properties from HDFS: - $ hdfs dfs -copyFromLocal ./conf/accumulo-client.properties /user/myuser/ + $ hdfs dfs -mkdir /user + $ hdfs dfs -mkdir /user/myuser + $ hdfs dfs -copyFromLocal /path/to/accumulo/conf/accumulo-client.properties /user/myuser/ $ ./bin/runmr mapreduce.WordCount -i /wc -t examples.wordcount2 -d /user/myuser/accumulo-client.properties -After the MapReduce job completes, query the `wordcount2` table. The results should +After the MapReduce job completes, query the `examples.wordcount2` table. The results should be the same as before: $ accumulo shell