tags 615900 + patch thanks Hi,
I attach a set of three patches that fix this bug, to be applied in the order they are numbered (...once_<NUM>-...). [Please read notes at the end of this message on how to apply] The core idea behind the patch-set is to defer the indexing until the installation procedure is finished, and then perform it just once in the background (by default incrementally, unless the index is missing, in which case the indexer switches to full mode). This of course required quite a number of changes that are briefly presented below: * Changes to the registering process workflow: the registered documents are appended to a file, which will be read later by the indexer, instead of being fed immediately to the indexer. * The above required a number of changes to the underlying Ruby classes (DHelpDocumentPool and Indexer), as well as to the dhelp_parse front-end. A test script was also touched, to comply with the changed code. * Insertion of a Dpkg::Post-Invoke trigger to do the indexing at the end of each installation run. This also required using ucf in the maintainer scripts to handle its creation/removal (in order to avoid errors interrupting the install process during purge of the package). * Insertion of some shell machinery as a wrapper to the indexing commands, namely: * an 'index-deferred' script that calls the real indexing commands, * use of nohup to ensure swish++ will keep running even if the controlling terminal disappears, * logging nohup-ed activities' output to /var/lib/dhelp/tmp , * a monthly cron job to cleanup log files inside that directory. * Changes to build and packaging files (Makefile, maintainer scripts, etc.) As a side effect, dhelp_parse -r now rebuilds the pool, without starting the indexer, and creates a pending list with all documents. Also the Indexer class will switch to full mode from incremental if the index is not found. Thus the sequence 'dhelp_parse -r ; dhelp_parse -i' can recreate a missing index. The patches have been tested regarding the following aspects: * Test suite: all tests pass. * Installation: install/upgrade/downgrade/remove/purge procedures and combinations of the above, with versions 0.6.19 and the local ones from the application of the patches. * Registration of doc-base documents: addition/removal of various *-doc packages after installation. ## NOTES: ## * Application of the patches leaves two .orig files due to succeeding with fuzz. This is OK; patches were crafted so that they can be independently applied with regard to patches sent to other bug reports (#561883 mainly, which also touches those two files). Those files (Makefile.orig , debian/dirs.orig) should be manually removed. * The executable permissions of index-deferred and debian/cron.monthly should be manually added after the patch. * Patch number 3 incorporates the patch sent in #642273 bug report. This was made for simplicity, to avoid sending and explaining pairs of separate patches. If #642273 is not accepted, I will send a modified version of the number 3 patch. regards George Zarkadas
--- a/dhelp_parse.rb +++ b/dhelp_parse.rb @@ -61,9 +61,9 @@ version "0.2.0" author "Esteban Manchado Velázquez" copyright "Copyright (c) 2005-2007, Esteban Manchado Velázquez" - synopsis "[-v] [-h] -a doc-base_file1 d-b_f2 ... | -d doc-base_file1 d-b_f2 ... | -r" + synopsis "[-v] [-h] -a doc-base_file1 d-b_f2 ... | -d doc-base_file1 d-b_f2 ... | -i | -r" short_description "Debian online help system parser" - long_description "Dhelp parser to add/remove/reindex dhelp files" + long_description "Dhelp parser to add/remove/index-incrementally/fully-reindex dhelp files" option :help option :names => %w(-a), :arity => [0,-1], @@ -77,9 +77,12 @@ option :names => %w(-v), :arity => [0,0], :opt_found => lambda { @verbose = true }, :opt_description => "verbose" + option :names => %w(-i), :arity => [0,0], + :opt_found => lambda { @action = :index }, + :opt_description => "perform deferred incremental indexing of pending registered docs" option :names => %w(-r), :arity => [0,0], :opt_found => lambda { @action = :reindex }, - :opt_description => "ignored, for compatibility" + :opt_description => "perform full re-indexing of all registered docs" expected_args [0,0] @@ -87,11 +90,52 @@ @verbose = false end - def packaged_configured? File.exists? '/var/lib/dhelp/configured' end + # Adds the documents supplied in command-line to the pool. + def add_documents(pool) + @doc_base_files.each do |doc_base_file| + if File.readable?(doc_base_file) + if @verbose + puts "Parsing document #{doc_base_file}" + end + doc_base_doc = Dhelp::DocBaseDocument.new(doc_base_file) + if @verbose + puts "Registering document #{doc_base_file}" + end + pool.register(doc_base_doc) + else + # Don't stop on single file errors; allow others with no error + # to be successfully registered. + $stderr.puts "Can't read doc-base file '#{doc_base_file}'" + end + end + end + + # Rebuilds the HTML indices to be in sync with pool's state. + def rebuild_html_index(pool) + if @verbose + puts "Rebuilding documentation index at #{DEFAULT_INDEX_ROOT}" + end + exporter = Dhelp::Exporter::Html.new(pool) + exporter.export(:dir => DEFAULT_INDEX_ROOT) + end + + # Starts the indexer to index the pending-documents-for-indexing list + def do_deferred_indexing(user_opts = {}) + opts = {} + if user_opts.has_key? :incremental + opts[:incremental] = user_opts[:incremental] + end + if @verbose + puts "Indexing documents contained in pending list" + end + indexer = Dhelp::Indexer.new(opts) + indexer.index + end + def main begin if packaged_configured? @@ -111,21 +155,7 @@ case @action when :add - @doc_base_files.each do |doc_base_file| - if File.readable? doc_base_file - if @verbose - puts "Parsing document #{doc_base_file}" - end - doc_base_doc = Dhelp::DocBaseDocument.new(doc_base_file) - if @verbose - puts "Registering/indexing document #{doc_base_file}" - end - pool.register(doc_base_doc) - else - $stderr.puts "Can't read doc-base file '#{doc_base_file}'" - return 1 - end - end + add_documents(pool) when :delete @doc_base_files.each do |doc_base_file| if @verbose @@ -133,19 +163,23 @@ end pool.deregister(doc_base_file) end + when :index + # Index incrementally, to update with registered so-far documents. + # This is the normal mode of operation, called by the dpkg trigger + # after the end of each installation run. + do_deferred_indexing when :reindex - # Simply ignore, the documentation directory will be up-to-date anyway. + # Recreate the pool, without doing a full indexing. + pool.rebuild(false) else $stderr.puts usage return 1 end # Always executed - if @verbose - puts "Rebuilding documentation index at #{DEFAULT_INDEX_ROOT}" - end - exporter = Dhelp::Exporter::Html.new(pool) - exporter.export(:dir => DEFAULT_INDEX_ROOT) + # We cannot defer this, unless a persistence mechanism between + # subsequent invocations of this binary is setup. + rebuild_html_index(pool) rescue => e puts "#{e.class}: #{e} (#{e.backtrace.join("\n")})" end --- a/lib/dhelp.rb +++ b/lib/dhelp.rb @@ -26,9 +26,13 @@ module Dhelp SUPPORTED_FORMATS = %w(html text pdf postscript dvi) DOC_DIR_DATABASE = '/var/lib/dhelp/doc-base_dirs' + DOC_INDEX_FILE = '/var/lib/dhelp/documents.index' + DOC_INDEX_CONFIG = '/usr/share/dhelp/swish++.conf' + DOC_PENDING_FILE = '/var/lib/dhelp/pending.list' # Exception for indexer errors (when calling index++) class Exception < RuntimeError; end # Base class Dhelp::Exception + class RegisterDocsError < Exception; end class IndexerError < Exception; end class KeyNotFoundError < Exception; end class InvalidOptionError < Exception; end @@ -336,9 +340,13 @@ # know everything about them. class DhelpDocumentPool def initialize(user_opts={}) - @opts = {:doc_dir_database => DOC_DIR_DATABASE}.merge(user_opts) + @opts = {:doc_dir_database => DOC_DIR_DATABASE, + :index_file => DOC_INDEX_FILE, + :indexer_config_file => DOC_INDEX_CONFIG, + :pending_file => DOC_PENDING_FILE}.merge(user_opts) Dhelp.check_options(@opts, [:index_file, :indexer_config_file, + :pending_file, :doc_base_dir, :doc_dir_database]) doc_base_pool_options = {} @@ -362,10 +370,25 @@ end # Rebuilds the whole pool, making sure everything is consistent - def rebuild + # If doindex = true (the default) full indexing is also performed, + # to achieve compatibility with the cron jobs shipped by versions + # 0.6.14 and 0.6.19 of the package, + # If doindex = false, the pending file is not cleaned and indexing + # is not performed, to make dhelp_parse/doc-base calls efficient. + def rebuild(doindex=true) + if doindex + _clean_pending_file + end @skip_list = [] # DocBaseDocumentPool works as an iterable document list _register_docs(@doc_base_document_pool, :regenerate_index => true) + if doindex + indexer = Indexer.new({:incremental => false, + :index_file => @opts[:index_file], + :config_file => @opts[:indexer_config_file], + :pending_file => @opts[:pending_file]}) + indexer.index + end end # Returns the DocBaseDocument objects that haven't been deregistered @@ -403,6 +426,22 @@ private + # Deletes the currently pending documents list to ensure that stale + # contents do not pollute the index with extraneous files when rebuild + # is invoked. + def _clean_pending_file + begin + File.delete(@opts[:pending_file]) + rescue Errno::ENOENT + return + rescue SystemCallError => e + # On any other system errors, report but allow to continue + # (else we may risk not-indexing current pending documents) + puts "Warning: system returned error code #{e.errno} " + + "while deleting #{pending_file}" + end + end + # Registers a list of doc-base documents as part of Dhelp def _register_docs(doc_list, user_opts={}) register_opts = {:regenerate_index => false}.merge(user_opts) @@ -438,17 +477,25 @@ end doc_dir_db.close + # Now append index_paths to index-pending registered docs file unless index_paths.empty? - indexer_opts = {} - if @opts.has_key? :index_file - indexer_opts[:index_file] = @opts[:index_file] + err_msg = "" + begin + File.open(@opts[:pending_file], "a+") do |f| + index_paths.each do |ip| + f.puts ip + end + end + rescue IOError => e + err_msg = "I/O error (#{e})" + rescue SystemCallError => e + err_msg = "System call error #{e.errno}" end - if @opts.has_key? :indexer_config_file - indexer_opts[:config_file] = @opts[:indexer_config_file] + unless err_msg == "" + raise RegisterDocsError, + "#{err_msg} while appending #{index_paths.join(', ')}," + + " to pending list #{register_opts[:pending_file]}" end - indexer = Indexer.new(indexer_opts) - indexer.index(index_paths, - :incremental => !register_opts[:regenerate_index]) end end end @@ -458,43 +505,104 @@ # Indexer class. So far it only takes care of doc-base documents. class Indexer def initialize(user_opts={}) - @opts = {:index_file => "/var/lib/dhelp/documents.index", - :config_file => "/usr/share/dhelp/swish++.conf", - :indexpp_cmd => "/usr/bin/index++"}.merge(user_opts) + @opts = {:incremental => true, + :index_file => DOC_INDEX_FILE, + :config_file => DOC_INDEX_CONFIG, + :indexpp_cmd => "/usr/bin/index++", + :pending_file => DOC_PENDING_FILE}.merge(user_opts) + Dhelp.check_options(@opts, [:incremental, + :index_file, + :config_file, + :indexpp_cmd, + :pending_file]) end + # Returns the value of incremental option + def incremental; @opts[:incremental]; end + # Returns the index file def index_file; @opts[:index_file]; end + # Returns the config file + def config_file; @opts[:config_file]; end + # Returns the index++ binary path def indexpp_cmd; @opts[:indexpp_cmd]; end # Returns the index++ command-line options - def indexpp_options(user_opts) - opts = {:incremental => true}.merge(user_opts) - "--config-file #{@opts[:config_file]} --index-file #{index_file}" + + def indexpp_options + "--config-file #{config_file} --index-file #{index_file}" + " --follow-links" + - (opts[:incremental] ? " --incremental" : "") + (incremental ? " --incremental" : "") + end + + # Returns the pending-documents-list-for-indexing file + def pending_file; @opts[:pending_file]; end + + # Returns the pending-documents-list-for-indexing contents as an array + def read_pending_file + # Don't use File.exists? to avoid races; use exceptions instead + begin + IO.readlines(pending_file) + rescue Errno::ENOENT + [] + rescue IOError, SystemCallError + raise IndexerError, "Error during reading index-pending list: " + + "#{pending_file}" + end + end + + # Deletes the currently pending documents list + def clean_pending_file + begin + File.delete(pending_file) + rescue Errno::ENOENT + return + rescue SystemCallError => e + # On any other system errors, report but allow to continue + # (else we may risk not-indexing current pending documents) + puts "Warning: system returned error code #{e.errno} " + + "while deleting #{pending_file}" + end + end + + # Returns true if already indexing + def is_indexing? + i_file = (incremental ? "#{index_file}.new" : index_file) + File.exists?(i_file) and File.size(i_file) == 0 end # Index the list of given dirs/files with index++. Directories are indexed # recursively. There is only one valid key for the user_opts hash: # :incremental, which adds the contents of the given paths to the index, # instead of replacing it with the indexed contents of paths. - def index(paths, user_opts={}) - opts = {:incremental => true}.merge(user_opts) - cmd = "#{indexpp_cmd} #{indexpp_options(opts)} -" - - # If the index doesn't exist yet and we're doing incremental, just exit - if opts[:incremental] and not File.exists?(index_file) - return + def index(paths=[], user_opts={}) + if user_opts.has_key? :incremental + @opts[:incremental] = user_opts[:incremental] + end + cmd = "#{indexpp_cmd} #{indexpp_options} -" + + # If the index doesn't exist yet and we're doing incremental, + # switch to non-incremental mode and create it. That way we will + # have at least a partial index until the next rebuild cron job. + if incremental and not File.exists?(index_file) + @opts[:incremental] = false + cmd = "#{indexpp_cmd} #{indexpp_options} -" end # If it's already indexing, just exit - if File.exists?(index_file) and File.size(index_file) == 0 + if is_indexing? return end + # Add to paths the list of currently pending documents + paths[paths.length, 0] = read_pending_file + # Delete the pending file here. Else any filenames added during the + # time we are indexing (by a subsequent invocation of dpkg) may be + # lost if we keep the file around and delete it later. + clean_pending_file + + # Finally, invoke the indexer to create the index. begin File.popen(cmd, "w") do |f| paths.each do |dir| @@ -511,7 +619,7 @@ # When using incremental indexing (default), a new index is created (with # the extension ".new") - if opts[:incremental] + if incremental FileUtils.mv "#{index_file}.new", index_file end end @@ -519,7 +627,12 @@ # DEPRECATED METHOD, ONLY FOR COMPATIBILITY WITH NON-UP-TO-DATE cron FILES def reindex_all - DhelpDocumentPool.new.rebuild + # Made a no-op, to avoid reverse-depends on DhelpDocumentPool + # The cron files in question are either pre-sarge or in intermediate + # dhelp versions which now only live on CDs; it's time to upgrade... + + puts "This dhelp method is not supported anymore." + + " Please upgrade your cron file to one of the newer versions." end end end --- a/test/tc_dhelpdocumentpool.rb +++ b/test/tc_dhelpdocumentpool.rb @@ -5,6 +5,7 @@ class TC_DhelpDocumentPool < Test::Unit::TestCase TEST_DOC_DIR_DATABASE = 'test/dddb' TEST_INDEX_FILE = 'test/index.swish++' + TEST_PENDING_FILE = 'test/pending.list' def doc_base_document(path) Dhelp::DocBaseDocument.new("test/doc-base-pool/#{path}") @@ -22,7 +23,8 @@ @pool = Dhelp::DhelpDocumentPool.new(:doc_base_dir => ['test/doc-base-pool'], :doc_dir_database => TEST_DOC_DIR_DATABASE, :index_file => TEST_INDEX_FILE, - :indexer_config_file => 'swish++.conf') + :indexer_config_file => 'swish++.conf', + :pending_file => TEST_PENDING_FILE) @doc_base_id_set = Set.new(['docbook-xsl-doc-html', 'pica-manual', 'pica-manual-2']) @@ -156,5 +158,6 @@ @doc_base_id_set = nil FileUtils.rm_f TEST_DOC_DIR_DATABASE FileUtils.rm_f TEST_INDEX_FILE + FileUtils.rm_f TEST_PENDING_FILE end end
--- a/Makefile +++ b/Makefile @@ -42,5 +42,7 @@ cp -r lib/* $(DESTDIR_)/lib/ruby/1.8 # Misc files/dirs cp *.rhtml *.tmpl swish++.conf $(DESTDIR_)/share/dhelp/ + cp index-deferred $(DESTDIR_)/share/dhelp/ + chmod 755 $(DESTDIR_)/share/dhelp/index-deferred mkdir -p $(DESTDIR)/etc cp dhelp.conf-sample $(DESTDIR)/etc/dhelp.conf new file mode 100755 --- /dev/null +++ b/index-deferred @@ -0,0 +1,47 @@ +#!/bin/sh +# Perform deferred indexing of documents registered during the last dpkg-run. +# This script is meant to be called as a dpkg post-invoke trigger. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public +# License along with this program; if not, write to the Free +# Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, +# MA 02111-1307 USA. + +TODAY=`date '+%F'` +LOGDIR=/var/lib/dhelp/tmp + +if [ "X${1}" = "X-f" ] || [ "X${1}" = "X--full" ]; then + INDEX_CMD='/etc/cron.weekly/dhelp' +else + INDEX_CMD='/usr/sbin/dhelp_parse -i' +fi +BINCMD=`echo "${INDEX_CMD}" | awk '{print $1}'` + +if [ ! -x ${BINCMD} ]; then + echo "Error: ${BINCMD} is missing or is not executable" + exit 1 +fi + +DHLOG=`mktemp --tmpdir=${LOGDIR} ${TODAY}.index-log.XXXXXXXX` + +if [ ! -f ${DHLOG} ]; then + echo "Error: unable to create temporary log file in ${LOGDIR}" + exit 2 +fi + +# Indexing may last a large amount of time, which may be (depending +# on how dpkg was called) larger than the timespan of the controlling +# process/terminal. Use nohup to account for this possibility. + +</dev/null >${DHLOG} 2>${DHLOG} nohup nice -n10 ${INDEX_CMD} & + --- a/man/dhelp_parse.8 +++ b/man/dhelp_parse.8 @@ -3,7 +3,7 @@ dhelp_parse \- Debian online help parser .SH SYNOPSIS .B dhelp_parse -.IR "[ -r | -a doc-base_file | -d doc-base_file ]" +.IR "[ -r | -i | -a doc-base_file | -d doc-base_file ]" .SH DESCRIPTION This program is used by package developers to register the documents included in a package. @@ -34,6 +34,11 @@ .I doc-base files from the dhelp database. .TP +.B \-i +Does an incremental update of the documents index for all +documentation added with the -a switch after the last index +update. +.TP .B \-r Ignored, kept for compatibility.
--- a/debian/control +++ b/debian/control @@ -7,7 +7,7 @@ Standards-Version: 3.8.4 Build-Depends: debhelper (>= 5), cdbs (>= 0.4.23-1.1), libgettext-ruby-util Package: dhelp -Depends: perl-modules, libtemplate-perl, libhtml-parser-perl, liburi-perl, ruby1.8, libdb-ruby1.8, libcommandline-ruby1.8, libgettext-ruby1.8, doc-base, swish++, liblocale-gettext-perl, libdata-page-perl, pstotext, poppler-utils, ${misc:Depends} +Depends: perl-modules, libtemplate-perl, libhtml-parser-perl, liburi-perl, ruby1.8, libdb-ruby1.8, libcommandline-ruby1.8, libgettext-ruby1.8, doc-base, swish++, liblocale-gettext-perl, libdata-page-perl, pstotext, poppler-utils, ucf (>= 0.8), ${misc:Depends} Recommends: iceweasel | firefox | www-browser Suggests: httpd, info2www, man2html, lynx | links | w3m | html2text, catdvi Architecture: all new file mode 100755 --- /dev/null +++ b/debian/cron.monthly @@ -0,0 +1,28 @@ +#!/bin/sh +# Remove all but the newest five *.index-log.* files, to keep them +# from accumulating inside /var/lib/dhelp/tmp. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public +# License along with this program; if not, write to the Free +# Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, +# MA 02111-1307 USA. + +if [ -d /var/lib/dhelp/tmp ]; then + ( + cd /var/lib/dhelp/tmp || exit + ls *.index-log.* \ + | sort --numeric-sort --field-separator='-' --key=1,4 --key=6,7 --key=9,10 \ + | head --lines=-5 \ + | xargs --no-run-if-empty --max-args=1 --max-lines=1 rm -f + ) +fi --- a/debian/dirs +++ b/debian/dirs @@ -2,3 +2,4 @@ var/lib/dhelp var/lib/dhelp/tmp etc/apache2/conf.d +etc/apt/apt.conf.d --- a/debian/postinst +++ b/debian/postinst @@ -34,13 +34,43 @@ # that call dhelp don't barf (from scrollkeeper package) touch /var/lib/dhelp/configured +DHELP_INDEXER=/usr/share/dhelp/index-deferred +DHELP_APTHOOK=/etc/apt/apt.conf.d/35dhelp + +# create and register with ucf our dpkg post-invoke hook + +DHELP_POSTCMD="\"if test -x ${DHELP_INDEXER}; then ${DHELP_INDEXER}; fi\"" +DHELP_TMP_APTHOOK=`mktemp 2>/dev/null` +if [ $? -ne 0 ]; then + >&2 echo "Error: unable to create temporary file" + exit 1 +fi +if ! chmod 644 ${DHELP_TMP_APTHOOK}; then + >&2 echo "Error: unable to change mode of temporary file" + exit 2 +fi +echo "Dpkg::Post-Invoke { ${DHELP_POSTCMD}; };" > ${DHELP_TMP_APTHOOK} + +ucf --debconf-ok --three-way ${DHELP_TMP_APTHOOK} ${DHELP_APTHOOK} +rm -f ${DHELP_TMP_APTHOOK} +ucfr dhelp ${DHELP_APTHOOK} + +# 'dhelp_parse -r', among other actions, produces the pending list. echo -n "Building HTML tree..." dhelp_parse -r echo " done." +# Thus, if index file does not exist, it suffices to do incremental indexing. if [ ! -f /var/lib/dhelp/documents.index ]; then echo "Reindexing documentation in the background" - /etc/cron.weekly/dhelp 2>/dev/null >/dev/null & + if [ -x ${DHELP_INDEXER} ]; then + ${DHELP_INDEXER} + else + >&2 echo "Error: ${DHELP_INDEXER} does not exist or is not executable" + exit 3 + fi fi +unset DHELP_INDEXER DHELP_POSTCMD DHELP_APTHOOK DHELP_TMP_APTHOOK || true + #DEBHELPER# --- a/debian/postrm +++ b/debian/postrm @@ -17,9 +17,20 @@ # Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, # MA 02111-1307 USA. -if [ "$1" = purge ]; then - # remove files generated by postinst - rm -rf /etc/dhelp +# remove and unregister with ucf our dpkg post-invoke hook + +DHELP_APTHOOK=/etc/apt/apt.conf.d/35dhelp + +for ext in '' '~' '%' .bak .ucf-new .ucf-old .ucf-dist; do + rm -f ${DHELP_APTHOOK}${ext} +done +if which ucf >/dev/null; then + ucf --purge ${DHELP_APTHOOK} fi +if which ucfr >/dev/null; then + ucfr --purge dhelp ${DHELP_APTHOOK} +fi + +unset DHELP_APTHOOK || true #DEBHELPER#
signature.asc
Description: This is a digitally signed message part