Re: [dev] [sbase][RFC] Add a simplistic version of tr

Silvan Jegen Sat, 30 Nov 2013 03:39:35 -0800

On Thu, Nov 28, 2013 at 12:45:40PM +0200, sin wrote:
> On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote:
> > Hi
> > 
> > This is a braindead and incomplete implementation of tr that only
> > works for one-byte encodings. Do you think it makes sense to use this
> > implementation as some kind of stopgap-measure until we have a more
> > robust version of tr?
> 
> This particular version of the patch does not introduce a manpage
> which would be necessary to document the limited behaviour of the
> current program.


I can add a man page as soon as we have decided whether we want Unicode
support or not.


> I am starting to wonder, do you guys think it would make sense to
> have a staging branch that we can use for incomplete tools?  Currently
> some of the tools implement a subset of the total behaviour but I'd
> like to believe that they implement that subset correctly.  As long as
> we document that they can go in master with possible eprintf("not 
> implemented");
> calls for the options that we care about.
> 
> Programs that are obviously buggy can go in the staging branch.

I don't mind either way. Having a staging area could allow the project
to grow faster since not every contribution has to be complete to be
included.


> > If you you would rather not take this version, what approach would
> > you take for the character set mapping when using UTF-8? A hashmap-,
> > or B-tree-based solution or something else entirely?
> 
> I am not knowledgeable enough about UTF-8 so I can't answer this.
> A B-tree is I think an overkill for sbase.  We do not have a nice
> implementation of a hash table in sbase as we did not need it but
> if we go down that path it makes sense to put this in util/ so other
> programs can benefit.  Currently we don't have an implementation of
> a singly linked list that we can reuse, but that is trivial enough and
> we've re-implemented it wherever needed (with the minimum set of
> operations needed for each tool).  I can send an implementation of
> a hash table that I've used for my own programs, MIT/X licensed and it is
> simple enough.
> 
> Regarding UTF-8, some other programs in sbase also lack proper handling
> of UTF-8.  Do you think we could embed libutf8 from suckless.org and
> use it?

I think having Unicode support is necessary at least in the long run
and UTF-8 is the way to go. libutf provides the most basic handling of
UTF-8 but should be sufficient as long as you do not want to go into
text normalization too much [1] [2]. BTW, the most recently updated version of
the library seems to be at https://github.com/cls/libutf/commits/master
and not at http://git.suckless.org/libutf/ for some reason.

[1] http://blog.golang.org/normalization
[2] http://mortoray.com/2013/11/27/the-string-type-is-broken/


> > +usage(void)
> > +{
> > +   eprintf("usage: tr set1 [set2]\n");
> > +}
> 
> Use %s and argv0.

I changed it in the new version of the patch that I will send out when
we have decided the Unicode issue.


> > +void
> > +handle_escapes(char *s)
> > +{
> > +    switch(*s) {
> > +   case 'n':
> > +           *s = '\x0A';
> > +           break;
> > +   case 't':
> > +           *s = '\x09';
> > +           break;
> > +   case '\\':
> > +           *s = '\x5c';
> > +           break;
> > +    }
> > +}
> 
> I have not yet applied this patch but I suspect you have
> mixed whitespace + tabs here.  Use tabs only.

You were right. I changed the whitespace to be tabs only.


> > +   if (ferror(stdin)) {
> > +       eprintf("<stdin>: read error:");
> > +       return EXIT_FAILURE;
> > +       }
> 
> Indentation issues.

Corrected.


Cheers,

Silvan

Re: [dev] [sbase][RFC] Add a simplistic version of tr

Reply via email to