Send inn-committers mailing list submissions to inn-committers@lists.isc.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.isc.org/mailman/listinfo/inn-committers or, via email, send a message with subject or body 'help' to inn-committers-requ...@lists.isc.org You can reach the person managing the list at inn-committers-ow...@lists.isc.org When replying, please edit your Subject line so it is more specific than "Re: Contents of inn-committers digest..." Today's Topics: 1. INN commit: trunk (4 files) (INN Commit) ---------------------------------------------------------------------- Message: 1 Date: Mon, 25 Aug 2014 10:13:12 -0700 (PDT) From: INN Commit <r...@isc.org> To: inn-committ...@isc.org Subject: INN commit: trunk (4 files) Message-ID: <20140825171312.bb8de67...@hope.eyrie.org> Date: Monday, August 25, 2014 @ 10:13:12 Author: iulius Revision: 9657 pullnews: new -a flag (hashfeed ability) Add a new feature to pullnews: hashfeed to split feeds. It uses MD5 and is Diablo-compatible. Thanks to Geraint Edwards for the patch. Modified: trunk/doc/pod/news.pod trunk/doc/pod/newsfeeds.pod trunk/doc/pod/pullnews.pod trunk/frontends/pullnews.in -----------------------+ doc/pod/news.pod | 3 +- doc/pod/newsfeeds.pod | 3 +- doc/pod/pullnews.pod | 40 +++++++++++++++++++++++++++++++++++- frontends/pullnews.in | 53 ++++++++++++++++++++++++++++++++++++++++++------ 4 files changed, 90 insertions(+), 9 deletions(-) Modified: doc/pod/news.pod =================================================================== --- doc/pod/news.pod 2014-08-24 13:25:28 UTC (rev 9656) +++ doc/pod/news.pod 2014-08-25 17:13:12 UTC (rev 9657) @@ -186,7 +186,8 @@ =item * Several improvements have been contributed to B<pullnews> by Geraint -Edwards: the new B<-B> flag triggers header-only feeding, the B<-m> +Edwards: the new B<-a> flag adds the Diablo-compatible hashfeed +ability, the new B<-B> flag triggers header-only feeding, the B<-m> flag now permits to remove headers matching (or not) a given regexp, and B<rnews> reporting is improved. Modified: doc/pod/newsfeeds.pod =================================================================== --- doc/pod/newsfeeds.pod 2014-08-24 13:25:28 UTC (rev 9656) +++ doc/pod/newsfeeds.pod 2014-08-25 17:13:12 UTC (rev 9657) @@ -440,7 +440,8 @@ Therefore, it allows to a generate a second level of deterministic distribution. Indeed, if a news server is fed C<Q1/2>, it can go on -splitting thanks to C<Q1-3/9_4> for instance. +splitting thanks to C<Q1-3/9_4> for instance. Up to four levels of +deterministic distribution can be used. The algorithm is compatible with the one used by S<Diablo 5.1> and up. If you want to use the legacy quickhashing method used by Diablo Modified: doc/pod/pullnews.pod =================================================================== --- doc/pod/pullnews.pod 2014-08-24 13:25:28 UTC (rev 9656) +++ doc/pod/pullnews.pod 2014-08-25 17:13:12 UTC (rev 9657) @@ -4,7 +4,8 @@ =head1 SYNOPSIS -B<pullnews> [B<-BhnOqRx>] [B<-b> I<fraction>] [B<-c> I<config>] [B<-C> I<width>] +B<pullnews> [B<-BhnOqRx>] [B<-a> I<hashfeed>] [B<-b> I<fraction>] +[B<-c> I<config>] [B<-C> I<width>] [B<-d> I<level>] [B<-f> I<fraction>] [B<-F> I<fakehop>] [B<-g> I<groups>] [B<-G> I<newsgroups>] [B<-H> I<headers>] [B<-k> I<checkpt>] [B<-l> I<logfile>] [B<-m> I<header_pats>] [B<-M> I<num>] [B<-N> I<timeout>] [B<-p> I<port>] @@ -41,6 +42,43 @@ =over 4 +=item B<-a> I<hashfeed> + +This option is a deterministic way to control the flow of articles and to +split a feed. The I<hashfeed> parameter must be in the form C<value/mod> +or C<start-end/mod>. The Message-ID of each article is hashed using MD5, +which results in a 128-bit hash. The lowest S<32 bits> are then taken +by default as the hashfeed value (which is an integer). If the hashfeed +value modulus C<mod> plus one equals C<value> or is between C<start> +and C<end>, B<pullnews> will feed the article. All these numbers must +be integers. + +For instance: + + pullnews -a 1/2 Feeds about 50% of all articles. + pullnews -a 2/2 Feeds the other 50% of all articles. + +Another example: + + pullnews -a 1-3/10 Feeds about 30% of all articles. + pullnews -a 4-5/10 Feeds about 20% of all articles. + pullnews -a 6-10/10 Feeds about 50% of all articles. + +You can use an extended syntax of the form C<value/mod:offset> or +C<start-end/mod:offset> (using an underscore C<_> instead of a colon +C<:> is also recognized). As MD5 generates a 128-bit return value, +it is possible to specify from which byte-offset the 32-bit integer +used by hashfeed starts. The default value for C<offset> is C<:0> and +thirteen overlapping values from C<:0> to C<:12> can be used. Only up to +four totally independent values exist: C<:0>, C<:4>, C<:8> and C<:12>. + +Therefore, it allows to a generate a second level of deterministic +distribution. Indeed, if B<pullnews> feeds C<1/2>, it can go on +splitting thanks to C<1-3/9:4> for instance. Up to four levels of +deterministic distribution can be used. + +The algorithm is compatible with the one used by S<Diablo 5.1> and up. + =item B<-b> I<fraction> Backtrack on server numbering reset. Specify the proportion (C<0.0> to C<1.0>) Modified: frontends/pullnews.in =================================================================== --- frontends/pullnews.in 2014-08-24 13:25:28 UTC (rev 9656) +++ frontends/pullnews.in 2014-08-25 17:13:12 UTC (rev 9657) @@ -13,6 +13,7 @@ # INN project. Major changes are: # # January 2010: Geraint A. Edwards added header-only feeding (-B); +# added ability to hashfeed (-a) - uses MD5 - Diablo-compatible; # enabled -m to remove headers matching (or not) a given regexp; # minor bug fix to rnews when -O; improved rnews reporting. # @@ -121,13 +122,19 @@ } $usage =~ s!.*/!!; -$usage .= " [ -BhnOqRx -b fraction -c config -C width -d level +$usage .= " [ -BhnOqRx -a hashfeed -b fraction -c config -C width -d level -f fraction -F fakehop -g groups -G newsgroups -H headers -k checkpt -l logfile -m header_pats -M num -N num -p port -P hop_limit -Q level -r file -s host[:port] -S num -t retries -T seconds -w num -z num -Z num ] [ upstream_host ... ] + -a hashfeed only feed article if the MD5 hash of the Message-ID + matches hashfeed (where hashfeed is of the form value/mod, + value/mod:offset, start-end/mod, or start-end/mod:offset). + The algorithm used is compatible with the one used by Diablo; + see the pullnews man page for more details. + -b fraction backtrack on server numbering reset. The proportion (0.0 to 1.0) of a group's articles to pull when the server's article number is less than our high for that @@ -231,11 +238,11 @@ "; -use vars qw($opt_b $opt_B $opt_c $opt_C $opt_d $opt_f $opt_F $opt_g $opt_G - $opt_h $opt_H $opt_k $opt_l $opt_m $opt_M $opt_n +use vars qw($opt_a $opt_b $opt_B $opt_c $opt_C $opt_d $opt_f $opt_F + $opt_g $opt_G $opt_h $opt_H $opt_k $opt_l $opt_m $opt_M $opt_n $opt_N $opt_O $opt_p $opt_P $opt_q $opt_Q $opt_r $opt_R $opt_s $opt_S $opt_t $opt_T $opt_w $opt_x $opt_z $opt_Z); -getopts("b:Bc:C:d:f:F:g:G:hH:k:l:m:M:nN:Op:P:qQ:r:Rs:S:t:T:w:xz:Z:") || die $usage; +getopts("a:b:Bc:C:d:f:F:g:G:hH:k:l:m:M:nN:Op:P:qQ:r:Rs:S:t:T:w:xz:Z:") || die $usage; die $usage if $opt_h; @@ -246,6 +253,7 @@ my $localServer = $opt_s || $defaultHost; my $localPort = $opt_p || $defaultPort; my $quiet = $opt_q; +my $hashfeed = $opt_a || ''; my $header_only = $opt_B; my $watermark = $opt_w; my $retries = $opt_t || $defaultRetries; @@ -288,6 +296,26 @@ die "``-z'' value not an integer: $opt_z\n" if defined $opt_z and $opt_z !~ /^\d+$/; die "``-Z'' value not an integer: $opt_Z\n" if defined $opt_Z and $opt_Z !~ /^\d+$/; +if ($hashfeed ne '') { + my $a_err = "``-a'' value not in format ``start[-end]/mod[:offset]'': $opt_a\n"; + die $a_err if $opt_a !~ m!^(\d+)(?:-(\d+))?/(\d+)(?:[:_](\d+))?$!; + $hashfeed = { + 'low' => $1, + 'high' => $2 || $1, + 'modulus' => $3, + 'offset' => $4 || 0, + }; + die $a_err if $hashfeed->{'low'} > $hashfeed->{'high'} + or $hashfeed->{'modulus'} == 0 + or $hashfeed->{'offset'} > 12; + if ($hashfeed->{'low'} == 1 and $hashfeed->{'high'} == $hashfeed->{'modulus'}) { + $hashfeed = ''; + } else { + require Digest::MD5; + Digest::MD5->import(qw/md5/); + } +} + $quiet = 1 if $quietness > 1; my %NNTP_Args = (); $NNTP_Args{'Timeout'} = $opt_N if defined $opt_N; @@ -409,7 +437,7 @@ print LOG " ``+'' is an article the downstream server accepted\n"; print LOG " ``x'' is an article the upstream server couldn't "; print LOG "give out\n"; - print LOG " ``m'' is an article skipped due to headers (-m or -P)\n"; + print LOG " ``m'' is an article skipped due to headers (-a, -m or -P)\n"; print LOG "\n"; print LOG "Writing to rnews-format output: $rnews\n\n" if $rnews; } @@ -743,7 +771,7 @@ my $tx_len = 0; # Transmitted article length (bytes) (for rnews, Bytes:). my @header_nums_to_go = (); my $match_all_hdrs = 1; # Assume no headers to match. - my $skip_due_to_hdrs = 0; + my $skip_due_to_hdrs = 0; # Set to 1 if triggered by -P, 2 if by -m, 3 if by -a. my %m_found_hdrs = (); my $curr_hdr = ''; @@ -894,9 +922,22 @@ } } + if (not $skip_due_to_hdrs and ref $hashfeed) { + my $hash_val = unpack('N', substr(md5($msgid), 12-$hashfeed->{'offset'}, 4)) % $hashfeed->{'modulus'} + 1; + $skip_due_to_hdrs = 3 if $hash_val < $hashfeed->{'low'} or $hash_val > $hashfeed->{'high'}; + } + $pulled->{$server}->{$group}++; if ($skip_due_to_hdrs) { + if ($debug >= 2) { + print LOG "\tDEBUGGING $i\tskip_art: " . + ($skip_due_to_hdrs == 1 ? 'hopsPath' + : ($skip_due_to_hdrs == 2 ? 'hdr' + : ($skip_due_to_hdrs == 3 ? 'hashfeed' + : 'unknown'))) . + "\n"; + } print LOG "m" unless $quiet; } elsif ($rnews) { printf RNEWS "#! rnews %d\n", $tx_len; ------------------------------ _______________________________________________ inn-committers mailing list inn-committers@lists.isc.org https://lists.isc.org/mailman/listinfo/inn-committers End of inn-committers Digest, Vol 66, Issue 5 *********************************************