inn-workers Digest, Vol 126, Issue 16

inn-workers-request Wed, 23 Dec 2020 11:19:43 -0800

Send inn-workers mailing list submissions to
        [email protected]


To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.isc.org/mailman/listinfo/inn-workers
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of inn-workers digest..."


Today's Topics:

   1. Re: ovsqlite (Julien ?LIE)
   2. Re: ovsqlite (Julien ?LIE)
   3. Re: ovsqlite (zlib edge-cases) (Julien ?LIE)
   4. Re: ovsqlite (Julien ?LIE)
   5. Overview methods and rebuild documentation (Julien ?LIE)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Dec 2020 13:04:31 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: ovsqlite
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Bo,

> I use a single compile-link-and-run test to probe for a sufficient
> version of SQLite.  Feel free to refine this if you have more patience
> with autoconf than I.

I also passed much time trying to improve the integration and checks 
done in sqlite3.m4, but did not come to anything much better yet.
I'm wondering whether the check for <inttypes.h> and PRIu64 could not be 
changed to just:

AC_CHECK_DECLS([PRIu64],
   [AC_DEFINE([OVSQLITE_USE_DICTIONARY], 1,
              [Enable use of a dictionary with zlib if PRIu64 is 
available.])])

As a matter of fact, PRIu64 is only used for compressed ovsqlite 
overview data.  And we would then use OVSQLITE_USE_DICTIONARY instead of 
USE_DICTIONARY (unconditionally defined otherwise in ovsqlite-server.c).

Would it suit you?  Or did you have another idea in mind for 
USE_DICTIONARY and PRIu64 availability?

-- 
Julien ?LIE

??La grandeur d'un m?tier, c'est peut-?tre avant tout d'unir les
   hommes.?? (Saint-Exup?ry)


------------------------------

Message: 2
Date: Wed, 23 Dec 2020 13:17:08 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: ovsqlite
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Responding to my previous message:
> I'll report tomorrow the result of the expiry process.

First run of expireover with ovsqlite.

For the same amount of lines processed (about 3,252,700), it took 10 
minutes with ovsqlite, which is fine, though slower than tradindexed 
(only 5 minutes).
I don't know how much time it would have taken with ovdb though.


     Article lines processed  3252700
     Articles dropped              57
     Overview index dropped      2673

I usually have the same numbers for articles dropped and overview index 
dropped.  Maybe the difference is normal for the first run on a new 
overview database built with makehistory?  (I can't tell as I do not 
know what was dropped exactly.)

-- 
Julien ?LIE

? Tous les champignons sont comestibles. Certains, une fois seulement. ?


------------------------------

Message: 3
Date: Wed, 23 Dec 2020 19:15:36 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: ovsqlite (zlib edge-cases)
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Bo,

A suggestion for close_db():

#ifdef HAVE_ZLIB
     if (use_compression) {
         inflateEnd(&inflation);
         deflateEnd(&deflation);
         buffer_free(flate);
     }
#endif


As for zlib compression, did you happen to see failures during your 
tests?  Maybe not easily reproducible with "tiny" data like overview but 
I think some codes are not totally handled.
Z_STREAM_END does not mean processing compressed data is finished.


 From ovsqlite-server.c:

         status = deflate(&deflation, Z_FINISH);
         flate->left = (char *)deflation.next_out-flate->data;
         if (status==Z_STREAM_END) {
             overview = (uint8_t *)flate->data;
             overview_len = flate->left;
         } else {
             /* This is safe; it overwrites the last byte of the overview
                length, which we have already unpacked. */
             *--overview = 0;
             overview_len++;
         }
         deflation.next_in = NULL;
         deflation.avail_in = 0;



         status = inflate(&inflation, Z_FINISH);
         flate->left = (char *)inflation.next_out-flate->data;
         inflation.next_in = NULL;
         inflation.avail_in = 0;
         inflateReset(&inflation);
         if (status != Z_STREAM_END)
             goto corrupted;


According to <https://zlib.net/manual.html>:

"If the parameter flush is set to Z_FINISH, pending input is processed, 
pending output is flushed and deflate returns with Z_STREAM_END if there 
was enough output space. If deflate returns with Z_OK or Z_BUF_ERROR, 
this function must be called again with Z_FINISH and more output space 
(updated avail_out) but no more input data, until it returns with 
Z_STREAM_END or an error. After deflate has returned Z_STREAM_END, the 
only possible operations on the stream are deflateReset or deflateEnd.

Z_FINISH can be used in the first deflate call after deflateInit if all 
the compression is to be done in a single step. In order to complete in 
one call, avail_out must be at least the value returned by deflateBound 
(see below). Then deflate is guaranteed to return Z_STREAM_END. If not 
enough output space is provided, deflate will not return Z_STREAM_END, 
and it must be called again as described above."

=> As deflateBound() is not used, it is not guaranteed that the 
operation totally finished in one call.  I suspect a latent bug then, 
though probably rare.



"inflate() returns Z_OK if some progress has been made (more input 
processed or more output produced), Z_STREAM_END if the end of the 
compressed data has been reached and all uncompressed output has been 
produced, Z_NEED_DICT if a preset dictionary is needed at this point, 
Z_DATA_ERROR if the input data was corrupted (input stream not 
conforming to the zlib format or incorrect check value, in which case 
strm->msg points to a string with a more specific error), Z_STREAM_ERROR 
if the stream structure was inconsistent (for example next_in or 
next_out was Z_NULL, or the state was inadvertently written over by the 
application), Z_MEM_ERROR if there was not enough memory, Z_BUF_ERROR if 
no progress was possible or if there was not enough room in the output 
buffer when Z_FINISH is used."

=> A response code different than Z_STREAM_END does not necessarily mean 
data is wrong.  Z_OK could have been returned.


FWIW, I came up with the following logic in nnrpd/line.c and 
nnrpd/nnrpd.c when implementing the NNTP COMPRESS command.  It appeared 
to work fine as far as I know but, who knows, maybe it also has other 
latent bugs.  zlib is pretty tricky...  Here is the opportunity to share 
best practices and improve our code.


         do {
             /* Grow the output buffer if needed. */
             if (zstream_out->avail_out == 0) {
                 size_t newsize = zbuf_out_size * 2;
                 zbuf_out = xrealloc(zbuf_out, newsize);
                 zstream_out->next_out = zbuf_out + zbuf_out_size;
                 zstream_out->avail_out = zbuf_out_size;
                 zbuf_out_size = newsize;
             }

             r = deflate(zstream_out,
                         zstream_flush_needed ? Z_PARTIAL_FLUSH : 
Z_NO_FLUSH);

             if (!(r == Z_OK || r == Z_BUF_ERROR || r == Z_STREAM_END)) {
                 sysnotice("deflate() failed: %d; %s", r,
                           zstream_out->msg != NULL ? zstream_out->msg :
                           "no detail");
                 return;
             }
         } while (r == Z_OK && zstream_out->avail_out == 0);




     do {
         if (zstream_in->avail_in > 0 || zstream_inflate_needed) {
             int r;

             zstream_in->next_out = p;
             zstream_in->avail_out = len;

             r = inflate(zstream_in, Z_SYNC_FLUSH);

             if (!(r == Z_OK || r == Z_BUF_ERROR || r == Z_STREAM_END)) {
                 sysnotice("inflate() failed: %d; %s", r,
                           zstream_in->msg != NULL ? zstream_in->msg :
                           "no detail");
                 n = -1;
                 break;
             }

             /* Check whether inflate() has finished to process its input.
              * If not, we need to call it again, even though avail_in 
is 0. */
             zstream_inflate_needed = (r != Z_STREAM_END);

             if (zstream_in->avail_out < len) {
                 /* Some data has been uncompressed.  Treat it now. */
                 n = len - zstream_in->avail_out;
                 break;
             }
             /* If we reach here, then it means that inflate() needs more
              * input, so we go on reading data on the wire. */
         }



-- 
Julien ?LIE

??Sol lucet omnibus.??


------------------------------

Message: 4
Date: Wed, 23 Dec 2020 19:38:14 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: ovsqlite
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Bo,

> At long last, ovsqlite!

The good news is that ovsqlite's integration to INN is ready.
I'm just waiting for your review of the following additions to 
documentation.  Do not hesitate to suggest more accurate wording.

Note that I've renamed the --with-sqlite3 configure flag to 
--with-sqlite out of consistency with our other configure flags, that do 
not mention a version.

Do you prefer Bo or /Bo (like in your signature) ?
Do you want your e-mail to be added in the HISTORY section of POD files?




--- doc/pod/news.pod    (r?vision 10457)
+++ doc/pod/news.pod    (copie de travail)
@@ -1,3 +1,21 @@
+=head1 Changes in 2.7.0
+
+=over 2
+
+=item *
+
+Bo Lindbergh has implemented a new overview storage method based on
+SQLite, known for its long-term stability and compatibility.  Fast and
+reliable, this new SQLite-based method is a perfect choice to store
+overview data.
+
+To select it as your overview method, set the I<ovmethod> parameter in
+F<inn.conf> to C<ovsqlite>.  Details about ovsqlite and how to switch to
+that new modern overview storage method can be found in the ovsqlite(5)
+and makehistory(8) man pages.
+
+=back
+
  =head1 Changes in 2.6.4





--- doc/pod/ovsqlite-server.pod (r?vision 0)
+++ doc/pod/ovsqlite-server.pod (copie de travail)
@@ -0,0 +1,74 @@
[...]
+The B<ovsqlite-server> daemon is the only program that opens the overview
+SQLite database.  It accepts connections from the other parts of INN that
+want to operate on overview data (B<innd>, B<nnrpd>, B<expireover>,
+B<makehistory>).
+
+This daemon must therefore be started before any other process can
+access the overview database.  B<ovsqlite-server> is normally invoked
+automatically by B<rc.news> when starting the news system.
+
+
+=head1 OPTIONS
+
+=over 4
+
+=item B<-d>
+
+B<ovsqlite-server> normally puts itself into the background, points
+its standard output and error to log files, and disassociates itself
+from the terminal.  Using B<-d> prevents all of this, resulting in log
+messages being written to the standard error output; this is generally
+useful only for debugging.
+
[...]
+=head1 HISTORY
+
+Initial implementation of ovsqlite written by Bo Lindbergh for 
InterNetNews.






--- doc/pod/ovsqlite.pod        (r?vision 0)
+++ doc/pod/ovsqlite.pod        (copie de travail)
@@ -0,0 +1,103 @@
+=head1 NAME
+
+ovsqlite - SQLite-based overview storage method for INN
+
+=head1 DESCRIPTION
+
+This method uses SQLite to store overview data.  It requires version
+3.8.2 or later of the SQLite library (3.20.0+ recommended).
+
+SQLite source, documentation, etc. are available at
+L<https://www.sqlite.org/>.  Ones of the stated goals of the SQLite
+file format are long-term stability and compatibility, which make that
+storage method particularly interesting to use.
+
+Only one protocol version of the ovsqlite storage method currently
+exists, implemented since S<INN 2.7.0>.
+
[...]
+=item I<compress>
[...]
+Enabling compression saves about S<70 %> of disk space on typical
+overview data.
[...]
+
+A transaction occurs every I<transrowlimit> articles or I<transtimelimit>
+seconds, whichever is smaller.  Inserting or deleting a database row
+within a transaction is very fast whereas committing a transaction is slow,
+especially on rotating storage.  Setting transaction limits too low
+leads to poor performance.  When rebuilding overview data, it may be
+worth temporarily raising these values, though.





--- doc/pod/rc.news.pod (r?vision 10457)
+++ doc/pod/rc.news.pod (copie de travail)
@@ -41,6 +41,11 @@

  =item *

+If I<ovmethod> is set to C<ovsqlite> in F<inn.conf>:  B<ovsqlite-server>
+is started and stopped.
+
+=item *
+
  If F<rc.news.local> exists in I<pathbin>:  B<rc.news.local> is run with
  argument C<start> or C<stop> (to perform site-specific startup or shutdown
  tasks).








--- doc/pod/install.pod (r?vision 10457)
+++ doc/pod/install.pod (copie de travail)
@@ -148,6 +148,7 @@
      --with-perl         Perl 5.004_03 or higher, 5.8.0+ recommended
      --with-python       Python 2.3.0 or higher, 2.5.0+ recommended (in 
the 2.x
series); Python 3.3.0 or higher (in the 3.x series)
      --with-bdb          Berkeley DB 4.4 or higher, 4.7+ recommended
+    --with-sqlite       SQLite 3.8.2 or higher, 3.20.0+ recommended
      --with-zlib         zlib 1.x or higher
      --with-openssl      OpenSSL 0.9.6 or higher
      --with-sasl         Cyrus SASL 2.x or higher
@@ -423,6 +424,22 @@
  built with S<Berkeley DB> support unless the B<--without-bdb> flag is
  explicitly passed to configure.

+=item B<--with-sqlite>=PATH
+
+Enables support for SQLite (3.8.2 or higher), which means that it
+will then be possible to use the ovsqlite overview method if you wish.
+Enabling this configure option doesn't mean you'll be required to use
+ovsqlite, but it does require that SQLite be installed on your system
+(including the header files, not just the runtime libraries).  If a
+path is given, it sets the installed directory of SQLite.  In case
+non-standard paths to the SQLite library is used, one or both of the
+options B<--with-sqlite-include> and B<--with-sqlite-lib> can be given
+to configure with a path.
+
+If the SQLite library is found at configure time, INN will be built
+with SQLite support unless the B<--without-sqlite> flag is explicitly
+passed to configure.
+
  =item B<--with-zlib>=PATH

  Enables support for compression for news reading, which means a
@@ -429,7 +446,8 @@
  compression layer can be negotiated between your server and newsreaders
  supporting that NNTP extension.

-Also enables support for compression with the ovdb storage method.
+Also enables support for compression with the ovdb and ovsqlite overview
+storage methods.

  This option requires that zlib be installed on your system (including the
  header files, not just the runtime libraries).  If a path is given, it






--- storage/ovsqlite/ovsqlite-server.c  (r?vision 0)
+++ storage/ovsqlite/ovsqlite-server.c  (copie de travail)
@@ -0,0 +1,2141 @@
+/*  $Id$
+**
+**  Daemon server to access overview database based on SQLite.
+**
+**  Original implementation written by Bo Lindbergh (2020-12-17).
+*/
+



--- storage/ovsqlite/ovsqlite.c (r?vision 0)
+++ storage/ovsqlite/ovsqlite.c (copie de travail)
@@ -0,0 +1,1108 @@
+/*  $Id$
+**
+**  Overview storage method based on SQLite.
+**
+**  Original implementation written by Bo Lindbergh (2020-12-17).
+*/
+



-- 
Julien ?LIE

??C'est de ma faute ? moi si les portes ne sont pas ? la hauteur de mes
   menhirs???? (Ast?rix)


------------------------------

Message: 5
Date: Wed, 23 Dec 2020 20:19:14 +0100
From: Julien ?LIE <[email protected]>
To: "[email protected]" <[email protected]>
Subject: Overview methods and rebuild documentation
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi all,

We'll need instructions to help people migrate to ovsqlite.  Here is a 
suggestion of wording for our documentation.
Bo's implementation of the first new overview storage method since INN 
2.3.0 is a major milestone that needs care!

Please tell any additions or wording changes you find appropriate.


Also, we currently compare the 4 overview storage methods in 3 
documents:  FAQ, INSTALL and inn.conf(5).
I would suggest to have the same wording at the 3 places, instead of 
different ones, if you're OK with that (or even better, only once and 
the 2 other places point to it - for instance to inn.conf, that is 
currently the one with the less explanations!).


Global suggestion for inn.conf, concatenating the arguments currently in 
the 3 documents:

buffindexed
Stores overview data and index information into preconfigured large 
files like CNFS.  Fast at writing, the buffindexed overview storage 
method can keep up with a large feed more easily and never consumes 
additional disk space beyond that allocated to these buffers.  The 
downside is that these buffers are hard to recover in case of corruption 
and somewhat slower for readers.  See the buffindexed.conf(5) man page 
for more details, and notably how to create the buffers.

ovdb
Stores overview information into a S<Berkeley DB> database.  This method 
is fast and very robust, but may require more disk space, unless 
compression is enabled.  See the ovdb(5) man page for more details.

ovsqlite
Stores overview information into an SQLite database, known for its 
long-term stability and compatibility.  This method is fast and very 
robust, but may require more disk space, unless compression is enabled. 
  Being the most recent of all methods, ovsqlite has not been as widely 
tested as the others yet, though already reliable enough to be used in 
production.  See the ovsqlite(5) man page for more details.

tradindexed
Uses two files per newsgroup, one containing the overview data and one
containing the index.  Fast for readers, but slow to write to because it 
has to update two files for each incoming article.  Its main advantage 
is to be the best tested, the most reliable and the method with the best 
recovery tools.







--- doc/pod/makehistory.pod     (r?vision 10457)
+++ doc/pod/makehistory.pod     (copie de travail)
@@ -24,7 +24,9 @@
  manager, and write a history line for every article.  To also generate
  overview information, use the B<-O> flag.

-WARNING:  If you're trying to rebuild the overview database, be sure to
+=head1 OVERVIEW REBUILD
+
+I<WARNING>:  If you're trying to rebuild the overview database, be sure to
  stop innd(8) and delete or zero out the existing database before you start
  for the best results.  An overview rebuild should not be done while the
  server is running.  Unless the existing overview is deleted, you may end
@@ -37,6 +39,52 @@
  rest of the server by running B<ovdb_init>; see ovdb_init(8) for more
  details.

+Similarly, if I<ovmethod> in F<inn.conf> is C<ovsqlite>, you must
+have the B<ovsqlite-server> process running while rebuilding overview.
+See ovsqlite-server(8) for more details and how to start it by hand.
+
+Rebuilding overview data is as straight-forward as:
+
+=over 4
+
+=item 1.
+
+Setting the new overview storage method in the I<ovmethod> parameter
+in F<inn.conf>.
+
+=item 2.
+
+Checking that its configuration file is correctly installed in
+I<pathetc> and fits your needs (F<buffindexed.conf>, F<ovdb.conf> or
+F<ovsqlite.conf>).  Note that the tradindexed overview storage method
+does not have a configuration file.
+
+=item 3.
+
+Making sure that INN is stopped.
+
+=item 4.
+
+Making sure that the directory specified by the I<pathoverview> parameter
+in F<inn.conf> exists and is empty.  Otherwise, rename the current one
+(to backup existing overview data) and re-create I<pathoverview> as
+the news user.
+
+=item 5.
+
+Starting B<ovdb_init> or B<ovsqlite-server> as the news user if the
+new overview storage method is respectively ovdb or ovsqlite.
+
+=item 6.
+
+Running C<makehistory -O -x -F> and waiting for the command to finish.
+
+=item 7.
+
+Starting INN and checking the logs to make sure everything is fine.
+
+=back
+

=> OK with that new "OVERVIEW REBUILD" section ?









===================================================================
--- doc/FAQ     (r?vision 10457)
+++ doc/FAQ     (copie de travail)
@@ -452,13 +452,19 @@
  clients.

  Any INN server that supports readers must therefore have an overview
-method configured.  There are three different methods to choose from:
-tradindexed, which is the slowest but the best tested and most reliable
-and the method with the best recovery tools; buffindexed, which is fast at
-writing because it uses preconfigured large buffers like CNFS, but which
-is harder to recover; and the experimental ovdb overview method, which
-stores overview information in a BerkeleyDB database.
+method configured.  There are four different methods to choose from:

+  - buffindexed, which is fast at writing because it uses preconfigured
+    large buffers like CNFS, but which is hard to recover;
+  - ovdb, which stores overview information in a Berkeley DB database
+    and supports compression;
+  - ovsqlite, implemented in INN 2.7.0, which stores overview information
+    in an SQLite database and supports compression, but still not as
+    widely tested as the other overview mechanisms (all introduced with
+    INN 2.3.0);
+  - tradindexed, which is the slowest but the best tested, the most
+    reliable and the method with the best recovery tools.
+
  ------------------------------

=> suggestion to point to inn.conf documentation instead





--- doc/pod/install.pod (r?vision 10457)
+++ doc/pod/install.pod (copie de travail)
@@ -666,11 +684,6 @@

  =over 4

-=item tradindexed
-
-It is very fast for readers, but it has to update two files for each
-incoming article and can be quite slow to write.
-
  =item buffindexed

  It can keep up with a large feed more easily, since it uses large buffers
@@ -682,10 +695,23 @@

  =item ovdb

-It stores overview data in a S<Berkeley DB> database; it's fast and 
very robust
,
-but may require more disk space.  See the ovdb(5) man page for more
-information on it.
+It stores overview data in a S<Berkeley DB> database; it's fast and very
+robust, but may require more disk space, unless compression is enabled.
+See the ovdb(5) man page for more information on it.

+=item ovsqlite
+
+It stores overview data in an SQLite database, known for its long-term
+stability and compatibility.  It's fast and very robust, but may require
+more disk space, unless compression is enabled.  See the ovsqlite(5)
+man page for more information on it.
+
+=item tradindexed
+
+It is very fast for readers, but it has to update two files for each
+incoming article and can be quite slow to write.  Robust and well-tested,
+with the best recovery tools.
+
  =back


=> suggestion to point to inn.conf documentation instead







--- doc/pod/inn.conf.pod        (r?vision 10457)
+++ doc/pod/inn.conf.pod        (copie de travail)
@@ -590,8 +590,9 @@
  =item I<ovmethod>

  Which overview storage method to use.  Currently supported values are
-C<tradindexed>, C<buffindexed>, and C<ovdb>.  There is no default value;
-this parameter must be set if I<enableoverview> is true (the default).
+C<buffindexed>, C<ovdb>, C<ovsqlite> and C<tradindexed>.  There is no
+default value; this parameter must be set if I<enableoverview> is true
+(the default).

  =over 4

@@ -601,15 +602,19 @@
  preconfigured files defined in F<buffindexed.conf>.  C<buffindexed> never
  consumes additional disk space beyond that allocated to these buffers.

+=item C<ovdb>
+
+Stores data into a S<Berkeley DB> database.  See the ovdb(5) man page.
+
+=item C<ovsqlite>
+
+Stores data into an SQLite database.  See the ovsqlite(5) man page.
+
  =item C<tradindexed>

  Uses two files per newsgroup, one containing the overview data and one
  containing the index.  Fast for readers, but slow to write to.

-=item C<ovdb>
-
-Stores data into a S<Berkeley DB> database.  See the ovdb(5) man page.
-
  =back


=> suggestion to have the comparison only here





--- doc/pod/checklist.pod       (r?vision 10457)
+++ doc/pod/checklist.pod       (copie de travail)
@@ -92,7 +92,7 @@

  You probably want B<--with-perl>.  If you're not using NetBSD with
  cycbuffs or OpenBSD, perhaps B<--with-tagged-hash>.  You might want to
-compile in TLS/SSL and S<Berkeley DB>, if your system supports them.  You
+compile in TLS/SSL and SQLite, if your system supports them.  You
  will need to have the relevant external libraries to compile (depending
  on whether you use OpenSSL for TLS/SSL access to your news server, GnuPG
  to verify the authenticity of Usenet control messages, Perl, Python, 
etc.).


=> suggestion to now prefer SQLite over Berkeley DB in CHECKLIST

-- 
Julien ?LIE

??Sol lucet omnibus.??


------------------------------

Subject: Digest Footer

_______________________________________________
inn-workers mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/inn-workers


------------------------------

End of inn-workers Digest, Vol 126, Issue 16
********************************************

inn-workers Digest, Vol 126, Issue 16

Reply via email to