On Wed, Aug 10, 2022 at 09:29:43PM +0200, Thomas Oltmann wrote:
> I think we can all agree that the current web archive over at
> lists.suckless.org isn't all that great;
> Author names get mangled, the navigation is terrible, some messages
> are duplicated, some missing.

I've noticed the missing mails too.

> Is there currently any interest in such a project here?

If it'd be an improvement over the current system then I don't see why
not.

> So far, I've gone ahead and implemented a sort of proof-of-concept (at
> https://www.github.com/tomolt/mailarchiver).

Hmm, interesting source code. A couple observations:

0. `.POSIX` needs to be first non-comment line in the Makefile
1. L277: pointer arithmetic is only valid as long as the result is
   within the array or just 1 past it.
2. L36: `mail` should be declared `static` as it's not used outside of
   the TU.

Usage of memcpy for string copying is good to see. I think more C
programmers should start thinking of strings as buffers and tracking
their length as necessary. Which can both improve efficiency and reduce
chances of buffer mishandling.

But in the case of `encode_html()`, stpcpy is probably the proper
function to use.

Anyways, I've attached patches for all the above. The stpcpy change is
opinionated, so feel free to reject that.

And one more thing:

        /* TODO we should probably handle EINTR and partial reads */

Best thing to do here is not using `read()` to begin with. Instead use
`mmap()` to map the file into a private buffer. Otherwise I think using
`fread` is also an (inferior) option, don't think you need to worry
about EINTR with fread.

- NRK
>From 2e16b5e88e2a9394c1b331d81994186151474391 Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Thu, 11 Aug 2022 14:13:55 +0600
Subject: [PATCH 1/4] .POSIX must be the first non-comment line
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

from `make 1p make`:

> The application shall ensure that this  special  target  is
> specified  without prerequisites or commands. If it appears
> as the first non-comment line in the makefile,  make  shall
> process  the  makefile as specified by this section; other‐
> wise, the behavior of make is unspecified.
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index bd4c260..851d368 100644
--- a/Makefile
+++ b/Makefile
@@ -1,10 +1,10 @@
+.POSIX:
+
 include config.mk
 
 SRC = mailarchiver.c
 OBJ = ${SRC:.c=.o}
 
-.POSIX:
-
 .PHONY: all clean install uninstall
 
 all: mailarchiver
-- 
2.35.1

>From 128a4d8ad26ffbd46f320747f408ad5a1edd0bbc Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Thu, 11 Aug 2022 14:19:33 +0600
Subject: [PATCH 2/4] fix pointer arithmetic

pointer arithmetic are only allowed to go 1 past the end of the array.
going any further than that is UB.

> If both the pointer operand and the result point to elements of the
> same array object, or one past the last element of the array object,
> the evaluation shall not produce an overflow; otherwise, the behavior
> is undefined.

ref: https://port70.net/~nsz/c/c11/n1570.html#6.5.6p8
---
 mailarchiver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mailarchiver.c b/mailarchiver.c
index e2a37f5..9f6fbad 100644
--- a/mailarchiver.c
+++ b/mailarchiver.c
@@ -274,7 +274,7 @@ encode_html(int fd, char *str)
 		case '\t': memcpy(w, "&nbsp;&nbsp;&nbsp;&nbsp;", 24); w += 24; break;
 		default: *w++ = *r;
 		}
-		if (w + 24 > buf + sizeof buf) {
+		if (buf + sizeof buf - w <= 24) {
 			write(fd, buf, w - buf);
 			w = buf;
 		}
-- 
2.35.1

>From 4e0bebc289d053c3fad8ea0f0fc401c0b18d3aef Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Thu, 11 Aug 2022 14:21:22 +0600
Subject: [PATCH 3/4] use stpcpy instead of memcpy

it's nice to see usage of memcpy but in this case stpcpy is the proper
function to use.

and optimizing compilers are able to optimize the function away [^0] due
to the src being available at compile time, so there's no performance
benefit towards using memcpy here either.

[^0]: https://godbolt.org/z/r89q6oGre
---
 mailarchiver.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mailarchiver.c b/mailarchiver.c
index 9f6fbad..b05c7df 100644
--- a/mailarchiver.c
+++ b/mailarchiver.c
@@ -266,12 +266,12 @@ encode_html(int fd, char *str)
 
 	for (r = str, w = buf; *r; r++) {
 		switch (*r) {
-		case '<': memcpy(w, "&lt;", 4); w += 4; break;
-		case '>': memcpy(w, "&gt;", 4); w += 4; break;
-		case '&': memcpy(w, "&amp;", 5); w += 5; break;
-		case '"': memcpy(w, "&quot;", 6); w += 6; break;
-		case '\n': memcpy(w, "\n<br/>\n", 7); w += 7; break;
-		case '\t': memcpy(w, "&nbsp;&nbsp;&nbsp;&nbsp;", 24); w += 24; break;
+		case '<':  w = stpcpy(w, "&lt;"); break;
+		case '>':  w = stpcpy(w, "&gt;"); break;
+		case '&':  w = stpcpy(w, "&amp;"); break;
+		case '"':  w = stpcpy(w, "&quot;"); break;
+		case '\n': w = stpcpy(w, "\n<br/>\n"); break;
+		case '\t': w = stpcpy(w, "&nbsp;&nbsp;&nbsp;&nbsp;"); break;
 		default: *w++ = *r;
 		}
 		if (buf + sizeof buf - w <= 24) {
-- 
2.35.1

>From d7076c67797309ebb9849d7267753371518c1756 Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Thu, 11 Aug 2022 14:50:09 +0600
Subject: [PATCH 4/4] make `mail` static

it's not used outside of the translation unit
---
 mailarchiver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mailarchiver.c b/mailarchiver.c
index b05c7df..529cb34 100644
--- a/mailarchiver.c
+++ b/mailarchiver.c
@@ -33,7 +33,7 @@ struct mail {
 	char *content;
 };
 
-struct mail mail;
+static struct mail mail;
 
 void
 die(const char *format, ...)
-- 
2.35.1

Reply via email to