[jira] [Updated] (SOLR-5786) MapReduceIndexerTool --help output is missing large parts of the help text

wolfgang hoschek (JIRA) Thu, 27 Feb 2014 06:22:32 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


wolfgang hoschek updated SOLR-5786:
-----------------------------------

    Description: 
As already mentioned repeatedly and at length, this is a regression introduced 
by the fix in https://issues.apache.org/jira/browse/SOLR-5605

Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:

{code}
130,235c130
<                          lucene  segments  left  in   this  index.  Merging
<                          segments involves reading  and  rewriting all data
<                          in all these  segment  files, potentially multiple
<                          times,  which  is  very  I/O  intensive  and  time
<                          consuming. However, an  index  with fewer segments
<                          can later be merged  faster,  and  it can later be
<                          queried  faster  once  deployed  to  a  live  Solr
<                          serving shard. Set  maxSegments  to  1 to optimize
<                          the index for low query  latency. In a nutshell, a
<                          small maxSegments  value  trades  indexing latency
<                          for subsequently improved query  latency. This can
<                          be  a  reasonable  trade-off  for  batch  indexing
<                          systems. (default: 1)
<   --fair-scheduler-pool STRING
<                          Optional tuning knob  that  indicates  the name of
<                          the fair scheduler  pool  to  submit  jobs to. The
<                          Fair Scheduler is a  pluggable MapReduce scheduler
<                          that provides a way to  share large clusters. Fair
<                          scheduling is a method  of  assigning resources to
<                          jobs such that all jobs  get, on average, an equal
<                          share of resources  over  time.  When  there  is a
<                          single job  running,  that  job  uses  the  entire
<                          cluster. When  other  jobs  are  submitted,  tasks
<                          slots that free up are  assigned  to the new jobs,
<                          so that each job gets  roughly  the same amount of
<                          CPU time.  Unlike  the  default  Hadoop scheduler,
<                          which forms a queue of  jobs, this lets short jobs
<                          finish in reasonable time  while not starving long
<                          jobs. It is also an  easy  way  to share a cluster
<                          between multiple of users.  Fair  sharing can also
<                          work with  job  priorities  -  the  priorities are
<                          used as  weights  to  determine  the  fraction  of
<                          total compute time that each job gets.
<   --dry-run              Run in local mode  and  print  documents to stdout
<                          instead of loading them  into  Solr. This executes
<                          the  morphline  in  the  client  process  (without
<                          submitting a job  to  MR)  for  quicker turnaround
<                          during early  trial  &  debug  sessions. (default:
<                          false)
<   --log4j FILE           Relative or absolute  path  to  a log4j.properties
<                          config file on the  local  file  system. This file
<                          will  be  uploaded  to   each  MR  task.  Example:
<                          /path/to/log4j.properties
<   --verbose, -v          Turn on verbose output. (default: false)
<   --show-non-solr-cloud  Also show options for  Non-SolrCloud  mode as part
<                          of --help. (default: false)
< 
< Required arguments:
<   --output-dir HDFS_URI  HDFS directory to  write  Solr  indexes to. Inside
<                          there one  output  directory  per  shard  will  be
<                          generated.    Example:     hdfs://c2202.mycompany.
<                          com/user/$USER/test
<   --morphline-file FILE  Relative or absolute path  to  a local config file
<                          that contains one  or  more  morphlines.  The file
<                          must     be      UTF-8      encoded.      Example:
<                          /path/to/morphline.conf
< 
< Cluster arguments:
<   Arguments that provide information about your Solr cluster. 
< 
<   --zk-host STRING       The address of a ZooKeeper  ensemble being used by
<                          a SolrCloud cluster. This  ZooKeeper ensemble will
<                          be examined  to  determine  the  number  of output
<                          shards to create  as  well  as  the  Solr  URLs to
<                          merge the output shards into  when using the --go-
<                          live option. Requires that  you  also  pass the --
<                          collection to merge the shards into.
<                          
<                          The   --zk-host   option   implements   the   same
<                          partitioning semantics as  the  standard SolrCloud
<                          Near-Real-Time (NRT)  API.  This  enables  to  mix
<                          batch  updates  from   MapReduce   ingestion  with
<                          updates from standard  Solr  NRT  ingestion on the
<                          same SolrCloud  cluster,  using  identical  unique
<                          document keys.
<                          
<                          Format is: a  list  of  comma  separated host:port
<                          pairs,  each  corresponding   to   a   zk  server.
<                          Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
<                          2183' If the optional  chroot  suffix  is used the
<                          example  would  look  like:  '127.0.0.1:2181/solr,
<                          127.0.0.1:2182/solr,127.0.0.1:2183/solr'     where
<                          the client would  be  rooted  at  '/solr'  and all
<                          paths would  be  relative  to  this  root  -  i.e.
<                          getting/setting/etc... '/foo/bar' would  result in
<                          operations being run on  '/solr/foo/bar' (from the
<                          server perspective).
<                          
< 
< Go live arguments:
<   Arguments for  merging  the  shards  that  are  built  into  a  live Solr
<   cluster. Also see the Cluster arguments.
< 
<   --go-live              Allows you to  optionally  merge  the  final index
<                          shards into a  live  Solr  cluster  after they are
<                          built. You can pass the  ZooKeeper address with --
<                          zk-host and the relevant  cluster information will
<                          be auto detected.  (default: false)
<   --collection STRING    The SolrCloud  collection  to  merge  shards  into
<                          when  using  --go-live   and  --zk-host.  Example:
<                          collection1
<   --go-live-threads INTEGER
<                          Tuning knob that indicates  the  maximum number of
<                          live merges  to  run  in  parallel  at  one  time.
<                          (default: 1000)
< 
---
>       
{code}

As already mentioned repeatedly and at length, this bug is because there's a 
change related to buffer flushing in argparse4 >= 0.4.2. 

The fix is to apply CDH-16434 to MapReduceIndexerTool.java as follows:

{code}
-            parser.printHelp(new PrintWriter(System.out));  
+            parser.printHelp();
{code}


  was:
As already mentioned repeatedly and at length, this is a regression introduced 
by the fix in https://issues.apache.org/jira/browse/SOLR-5605

Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:

{code}
130,235c130
<                          lucene  segments  left  in   this  index.  Merging
<                          segments involves reading  and  rewriting all data
<                          in all these  segment  files, potentially multiple
<                          times,  which  is  very  I/O  intensive  and  time
<                          consuming. However, an  index  with fewer segments
<                          can later be merged  faster,  and  it can later be
<                          queried  faster  once  deployed  to  a  live  Solr
<                          serving shard. Set  maxSegments  to  1 to optimize
<                          the index for low query  latency. In a nutshell, a
<                          small maxSegments  value  trades  indexing latency
<                          for subsequently improved query  latency. This can
<                          be  a  reasonable  trade-off  for  batch  indexing
<                          systems. (default: 1)
<   --fair-scheduler-pool STRING
<                          Optional tuning knob  that  indicates  the name of
<                          the fair scheduler  pool  to  submit  jobs to. The
<                          Fair Scheduler is a  pluggable MapReduce scheduler
<                          that provides a way to  share large clusters. Fair
<                          scheduling is a method  of  assigning resources to
<                          jobs such that all jobs  get, on average, an equal
<                          share of resources  over  time.  When  there  is a
<                          single job  running,  that  job  uses  the  entire
<                          cluster. When  other  jobs  are  submitted,  tasks
<                          slots that free up are  assigned  to the new jobs,
<                          so that each job gets  roughly  the same amount of
<                          CPU time.  Unlike  the  default  Hadoop scheduler,
<                          which forms a queue of  jobs, this lets short jobs
<                          finish in reasonable time  while not starving long
<                          jobs. It is also an  easy  way  to share a cluster
<                          between multiple of users.  Fair  sharing can also
<                          work with  job  priorities  -  the  priorities are
<                          used as  weights  to  determine  the  fraction  of
<                          total compute time that each job gets.
<   --dry-run              Run in local mode  and  print  documents to stdout
<                          instead of loading them  into  Solr. This executes
<                          the  morphline  in  the  client  process  (without
<                          submitting a job  to  MR)  for  quicker turnaround
<                          during early  trial  &  debug  sessions. (default:
<                          false)
<   --log4j FILE           Relative or absolute  path  to  a log4j.properties
<                          config file on the  local  file  system. This file
<                          will  be  uploaded  to   each  MR  task.  Example:
<                          /path/to/log4j.properties
<   --verbose, -v          Turn on verbose output. (default: false)
<   --show-non-solr-cloud  Also show options for  Non-SolrCloud  mode as part
<                          of --help. (default: false)
< 
< Required arguments:
<   --output-dir HDFS_URI  HDFS directory to  write  Solr  indexes to. Inside
<                          there one  output  directory  per  shard  will  be
<                          generated.    Example:     hdfs://c2202.mycompany.
<                          com/user/$USER/test
<   --morphline-file FILE  Relative or absolute path  to  a local config file
<                          that contains one  or  more  morphlines.  The file
<                          must     be      UTF-8      encoded.      Example:
<                          /path/to/morphline.conf
< 
< Cluster arguments:
<   Arguments that provide information about your Solr cluster. 
< 
<   --zk-host STRING       The address of a ZooKeeper  ensemble being used by
<                          a SolrCloud cluster. This  ZooKeeper ensemble will
<                          be examined  to  determine  the  number  of output
<                          shards to create  as  well  as  the  Solr  URLs to
<                          merge the output shards into  when using the --go-
<                          live option. Requires that  you  also  pass the --
<                          collection to merge the shards into.
<                          
<                          The   --zk-host   option   implements   the   same
<                          partitioning semantics as  the  standard SolrCloud
<                          Near-Real-Time (NRT)  API.  This  enables  to  mix
<                          batch  updates  from   MapReduce   ingestion  with
<                          updates from standard  Solr  NRT  ingestion on the
<                          same SolrCloud  cluster,  using  identical  unique
<                          document keys.
<                          
<                          Format is: a  list  of  comma  separated host:port
<                          pairs,  each  corresponding   to   a   zk  server.
<                          Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
<                          2183' If the optional  chroot  suffix  is used the
<                          example  would  look  like:  '127.0.0.1:2181/solr,
<                          127.0.0.1:2182/solr,127.0.0.1:2183/solr'     where
<                          the client would  be  rooted  at  '/solr'  and all
<                          paths would  be  relative  to  this  root  -  i.e.
<                          getting/setting/etc... '/foo/bar' would  result in
<                          operations being run on  '/solr/foo/bar' (from the
<                          server perspective).
<                          
< 
< Go live arguments:
<   Arguments for  merging  the  shards  that  are  built  into  a  live Solr
<   cluster. Also see the Cluster arguments.
< 
<   --go-live              Allows you to  optionally  merge  the  final index
<                          shards into a  live  Solr  cluster  after they are
<                          built. You can pass the  ZooKeeper address with --
<                          zk-host and the relevant  cluster information will
<                          be auto detected.  (default: false)
<   --collection STRING    The SolrCloud  collection  to  merge  shards  into
<                          when  using  --go-live   and  --zk-host.  Example:
<                          collection1
<   --go-live-threads INTEGER
<                          Tuning knob that indicates  the  maximum number of
<                          live merges  to  run  in  parallel  at  one  time.
<                          (default: 1000)
< 
---
>       
{code}

As already mentioned repeatedly and at length, the fix is to to apply CDH-16434 
to MapReduceIndexerTool.java because there's a change related to buffer 
flushing in argparse4 >= 0.4.2:

{code}
-            parser.printHelp(new PrintWriter(System.out));  
+            parser.printHelp();
{code}



> MapReduceIndexerTool --help output is missing large parts of the help text
> --------------------------------------------------------------------------
>
>                 Key: SOLR-5786
>                 URL: https://issues.apache.org/jira/browse/SOLR-5786
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - MapReduce
>    Affects Versions: 4.7
>            Reporter: wolfgang hoschek
>            Assignee: Mark Miller
>             Fix For: 4.8
>
>
> As already mentioned repeatedly and at length, this is a regression 
> introduced by the fix in https://issues.apache.org/jira/browse/SOLR-5605
> Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:
> {code}
> 130,235c130
> <                          lucene  segments  left  in   this  index.  Merging
> <                          segments involves reading  and  rewriting all data
> <                          in all these  segment  files, potentially multiple
> <                          times,  which  is  very  I/O  intensive  and  time
> <                          consuming. However, an  index  with fewer segments
> <                          can later be merged  faster,  and  it can later be
> <                          queried  faster  once  deployed  to  a  live  Solr
> <                          serving shard. Set  maxSegments  to  1 to optimize
> <                          the index for low query  latency. In a nutshell, a
> <                          small maxSegments  value  trades  indexing latency
> <                          for subsequently improved query  latency. This can
> <                          be  a  reasonable  trade-off  for  batch  indexing
> <                          systems. (default: 1)
> <   --fair-scheduler-pool STRING
> <                          Optional tuning knob  that  indicates  the name of
> <                          the fair scheduler  pool  to  submit  jobs to. The
> <                          Fair Scheduler is a  pluggable MapReduce scheduler
> <                          that provides a way to  share large clusters. Fair
> <                          scheduling is a method  of  assigning resources to
> <                          jobs such that all jobs  get, on average, an equal
> <                          share of resources  over  time.  When  there  is a
> <                          single job  running,  that  job  uses  the  entire
> <                          cluster. When  other  jobs  are  submitted,  tasks
> <                          slots that free up are  assigned  to the new jobs,
> <                          so that each job gets  roughly  the same amount of
> <                          CPU time.  Unlike  the  default  Hadoop scheduler,
> <                          which forms a queue of  jobs, this lets short jobs
> <                          finish in reasonable time  while not starving long
> <                          jobs. It is also an  easy  way  to share a cluster
> <                          between multiple of users.  Fair  sharing can also
> <                          work with  job  priorities  -  the  priorities are
> <                          used as  weights  to  determine  the  fraction  of
> <                          total compute time that each job gets.
> <   --dry-run              Run in local mode  and  print  documents to stdout
> <                          instead of loading them  into  Solr. This executes
> <                          the  morphline  in  the  client  process  (without
> <                          submitting a job  to  MR)  for  quicker turnaround
> <                          during early  trial  &  debug  sessions. (default:
> <                          false)
> <   --log4j FILE           Relative or absolute  path  to  a log4j.properties
> <                          config file on the  local  file  system. This file
> <                          will  be  uploaded  to   each  MR  task.  Example:
> <                          /path/to/log4j.properties
> <   --verbose, -v          Turn on verbose output. (default: false)
> <   --show-non-solr-cloud  Also show options for  Non-SolrCloud  mode as part
> <                          of --help. (default: false)
> < 
> < Required arguments:
> <   --output-dir HDFS_URI  HDFS directory to  write  Solr  indexes to. Inside
> <                          there one  output  directory  per  shard  will  be
> <                          generated.    Example:     hdfs://c2202.mycompany.
> <                          com/user/$USER/test
> <   --morphline-file FILE  Relative or absolute path  to  a local config file
> <                          that contains one  or  more  morphlines.  The file
> <                          must     be      UTF-8      encoded.      Example:
> <                          /path/to/morphline.conf
> < 
> < Cluster arguments:
> <   Arguments that provide information about your Solr cluster. 
> < 
> <   --zk-host STRING       The address of a ZooKeeper  ensemble being used by
> <                          a SolrCloud cluster. This  ZooKeeper ensemble will
> <                          be examined  to  determine  the  number  of output
> <                          shards to create  as  well  as  the  Solr  URLs to
> <                          merge the output shards into  when using the --go-
> <                          live option. Requires that  you  also  pass the --
> <                          collection to merge the shards into.
> <                          
> <                          The   --zk-host   option   implements   the   same
> <                          partitioning semantics as  the  standard SolrCloud
> <                          Near-Real-Time (NRT)  API.  This  enables  to  mix
> <                          batch  updates  from   MapReduce   ingestion  with
> <                          updates from standard  Solr  NRT  ingestion on the
> <                          same SolrCloud  cluster,  using  identical  unique
> <                          document keys.
> <                          
> <                          Format is: a  list  of  comma  separated host:port
> <                          pairs,  each  corresponding   to   a   zk  server.
> <                          Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
> <                          2183' If the optional  chroot  suffix  is used the
> <                          example  would  look  like:  '127.0.0.1:2181/solr,
> <                          127.0.0.1:2182/solr,127.0.0.1:2183/solr'     where
> <                          the client would  be  rooted  at  '/solr'  and all
> <                          paths would  be  relative  to  this  root  -  i.e.
> <                          getting/setting/etc... '/foo/bar' would  result in
> <                          operations being run on  '/solr/foo/bar' (from the
> <                          server perspective).
> <                          
> < 
> < Go live arguments:
> <   Arguments for  merging  the  shards  that  are  built  into  a  live Solr
> <   cluster. Also see the Cluster arguments.
> < 
> <   --go-live              Allows you to  optionally  merge  the  final index
> <                          shards into a  live  Solr  cluster  after they are
> <                          built. You can pass the  ZooKeeper address with --
> <                          zk-host and the relevant  cluster information will
> <                          be auto detected.  (default: false)
> <   --collection STRING    The SolrCloud  collection  to  merge  shards  into
> <                          when  using  --go-live   and  --zk-host.  Example:
> <                          collection1
> <   --go-live-threads INTEGER
> <                          Tuning knob that indicates  the  maximum number of
> <                          live merges  to  run  in  parallel  at  one  time.
> <                          (default: 1000)
> < 
> ---
> >       
> {code}
> As already mentioned repeatedly and at length, this bug is because there's a 
> change related to buffer flushing in argparse4 >= 0.4.2. 
> The fix is to apply CDH-16434 to MapReduceIndexerTool.java as follows:
> {code}
> -            parser.printHelp(new PrintWriter(System.out));  
> +            parser.printHelp();
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5786) MapReduceIndexerTool --help output is missing large parts of the help text

Reply via email to