On Mon, Jun 02, 2025 at 01:42:50PM -0300, K R wrote:
> Hi Jason,
> 
> On Thu, May 29, 2025 at 11:08???AM Jason McIntyre <j...@kerhand.co.uk> wrote:
> >
> > On Wed, May 28, 2025 at 05:22:55PM -0300, K R wrote:
> > > >Synopsis:      fortune(6): fortunes2 file has duplicate entries
> > > >Category:      system games
> > > >Environment:
> > >         System      : OpenBSD 7.7
> > >         Details     : OpenBSD 7.7 (GENERIC) #0: Sun May  4 11:10:16 MDT 
> > > 2025
> > >
> > > r...@syspatch-77-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > >
> > >         Architecture: OpenBSD.amd64
> > >         Machine     : amd64
> > > >Description:
> > >
> > >         There are 100+ entries in the fortunes2 file that are already
> > >         present in the fortunes file.
> > >
> > > >How-To-Repeat:
> > >
> > >         cd /tmp
> > >         cp /usr/share/games/fortune/{fortunes,fortunes2} .
> > >         split -a 4 -p '^%$' fortunes fortunes.
> > >         split -a 4 -p '^%$' fortunes2 fortunes2.
> > >         sha256 fortunes.* > SHA256.fortunes
> > >         sha256 fortunes2.* > SHA256.fortunes2
> > >         # compare the two SHA256 files...
> > >
> > > >Fix:
> > >         diff below removes the duplicate entries from fortunes2.
> > >
> > > Thanks,
> > > --Kor
> >
> > hi.
> >
> > this methodology is too smart for me! if another obsd dev wants to
> > confirm it's sound, i'd be happy to remove dups (or said dev could
> > kindly take care of it themselves ;)
> 
> Sorry for the delay.  You' re right, the methodology can be simplified.
> 
> Please find attached a simple Python script that detects duplicate
> entries in fortune files.  It uses sets to do that. Given two files,
> file1 and file2, detects entries in file2 already present in file1.
> It warns to stderr and writes file2 to stdout with the duplicate
> entries removed.
> 
> Looking into other fortune(6) files, it's not only fortunes2 that has
> duplicates:
> 
> /usr/share/games/fortune/fortunes2: 104 dups, from fortunes
> /usr/share/games/fortune/fortunes2-o: 73 dups, from fortunes-o
> /usr/share/games/fortune/limerick: 10 dups, from fortunes
> /usr/share/games/fortune/zippy: 8 dups, from fortunes
> 
> I hope it helps.
> 
> Thanks,
> --Kor
> 
> >
> > jmc
> >

hi. thanks for the update. however it's not that i need a simpler way to
check for dups - since i don;t have the technical skill to verify that
these tests are correct, and since i don;t want to spend a ton of hours
manually checking, what i need is a technical ok from an obsd dev that the
methods are correct.

in hindsight, i probably should have left this thread alone to sink or
swim.

one more thought - although there are obviously dups, it can be argued
that correctly so. for example, why would you remove dups from limerick
that are in fortune? someone might want only limericks, and would not
benefit from having them removed.

so any such diff would have to take that into account. as far as i can
see, the only dup removal diff that makes sense is to remove dups from
fortune and fortune2.

jmc

Reply via email to