wolfgang hoschek created SOLR-5786:
--------------------------------------
Summary: MapReduceIndexerTool --help text is missing large parts
of the help text
Key: SOLR-5786
URL: https://issues.apache.org/jira/browse/SOLR-5786
Project: Solr
Issue Type: Bug
Components: contrib - MapReduce
Affects Versions: 4.7
Reporter: wolfgang hoschek
Assignee: Mark Miller
Fix For: 4.8
As already mentioned repeatedly and at length, this is a regression introduced
by the fix in https://issues.apache.org/jira/browse/SOLR-5605
Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:
{code}
130,235c130
< lucene segments left in this index. Merging
< segments involves reading and rewriting all data
< in all these segment files, potentially multiple
< times, which is very I/O intensive and time
< consuming. However, an index with fewer segments
< can later be merged faster, and it can later be
< queried faster once deployed to a live Solr
< serving shard. Set maxSegments to 1 to optimize
< the index for low query latency. In a nutshell, a
< small maxSegments value trades indexing latency
< for subsequently improved query latency. This can
< be a reasonable trade-off for batch indexing
< systems. (default: 1)
< --fair-scheduler-pool STRING
< Optional tuning knob that indicates the name of
< the fair scheduler pool to submit jobs to. The
< Fair Scheduler is a pluggable MapReduce scheduler
< that provides a way to share large clusters. Fair
< scheduling is a method of assigning resources to
< jobs such that all jobs get, on average, an equal
< share of resources over time. When there is a
< single job running, that job uses the entire
< cluster. When other jobs are submitted, tasks
< slots that free up are assigned to the new jobs,
< so that each job gets roughly the same amount of
< CPU time. Unlike the default Hadoop scheduler,
< which forms a queue of jobs, this lets short jobs
< finish in reasonable time while not starving long
< jobs. It is also an easy way to share a cluster
< between multiple of users. Fair sharing can also
< work with job priorities - the priorities are
< used as weights to determine the fraction of
< total compute time that each job gets.
< --dry-run Run in local mode and print documents to stdout
< instead of loading them into Solr. This executes
< the morphline in the client process (without
< submitting a job to MR) for quicker turnaround
< during early trial & debug sessions. (default:
< false)
< --log4j FILE Relative or absolute path to a log4j.properties
< config file on the local file system. This file
< will be uploaded to each MR task. Example:
< /path/to/log4j.properties
< --verbose, -v Turn on verbose output. (default: false)
< --show-non-solr-cloud Also show options for Non-SolrCloud mode as part
< of --help. (default: false)
<
< Required arguments:
< --output-dir HDFS_URI HDFS directory to write Solr indexes to. Inside
< there one output directory per shard will be
< generated. Example: hdfs://c2202.mycompany.
< com/user/$USER/test
< --morphline-file FILE Relative or absolute path to a local config file
< that contains one or more morphlines. The file
< must be UTF-8 encoded. Example:
< /path/to/morphline.conf
<
< Cluster arguments:
< Arguments that provide information about your Solr cluster.
<
< --zk-host STRING The address of a ZooKeeper ensemble being used by
< a SolrCloud cluster. This ZooKeeper ensemble will
< be examined to determine the number of output
< shards to create as well as the Solr URLs to
< merge the output shards into when using the --go-
< live option. Requires that you also pass the --
< collection to merge the shards into.
<
< The --zk-host option implements the same
< partitioning semantics as the standard SolrCloud
< Near-Real-Time (NRT) API. This enables to mix
< batch updates from MapReduce ingestion with
< updates from standard Solr NRT ingestion on the
< same SolrCloud cluster, using identical unique
< document keys.
<
< Format is: a list of comma separated host:port
< pairs, each corresponding to a zk server.
< Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
< 2183' If the optional chroot suffix is used the
< example would look like: '127.0.0.1:2181/solr,
< 127.0.0.1:2182/solr,127.0.0.1:2183/solr' where
< the client would be rooted at '/solr' and all
< paths would be relative to this root - i.e.
< getting/setting/etc... '/foo/bar' would result in
< operations being run on '/solr/foo/bar' (from the
< server perspective).
<
<
< Go live arguments:
< Arguments for merging the shards that are built into a live Solr
< cluster. Also see the Cluster arguments.
<
< --go-live Allows you to optionally merge the final index
< shards into a live Solr cluster after they are
< built. You can pass the ZooKeeper address with --
< zk-host and the relevant cluster information will
< be auto detected. (default: false)
< --collection STRING The SolrCloud collection to merge shards into
< when using --go-live and --zk-host. Example:
< collection1
< --go-live-threads INTEGER
< Tuning knob that indicates the maximum number of
< live merges to run in parallel at one time.
< (default: 1000)
<
---
>
{code}
As already mentioned repeatedly and at length, the fix is to to apply CDH-16434
to MapReduceIndexerTool.java because there's a change related to buffer
flushing in argparse4 >= 0.4.2:
{code}
- parser.printHelp(new PrintWriter(System.out));
+ parser.printHelp();
{code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]