On Thu, Nov 28, 2013 at 12:45:40PM +0200, sin wrote: > On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote: > > Hi > > > > This is a braindead and incomplete implementation of tr that only > > works for one-byte encodings. Do you think it makes sense to use this > > implementation as some kind of stopgap-measure until we have a more > > robust version of tr? > > This particular version of the patch does not introduce a manpage > which would be necessary to document the limited behaviour of the > current program.
I can add a man page as soon as we have decided whether we want Unicode support or not. > I am starting to wonder, do you guys think it would make sense to > have a staging branch that we can use for incomplete tools? Currently > some of the tools implement a subset of the total behaviour but I'd > like to believe that they implement that subset correctly. As long as > we document that they can go in master with possible eprintf("not > implemented"); > calls for the options that we care about. > > Programs that are obviously buggy can go in the staging branch. I don't mind either way. Having a staging area could allow the project to grow faster since not every contribution has to be complete to be included. > > If you you would rather not take this version, what approach would > > you take for the character set mapping when using UTF-8? A hashmap-, > > or B-tree-based solution or something else entirely? > > I am not knowledgeable enough about UTF-8 so I can't answer this. > A B-tree is I think an overkill for sbase. We do not have a nice > implementation of a hash table in sbase as we did not need it but > if we go down that path it makes sense to put this in util/ so other > programs can benefit. Currently we don't have an implementation of > a singly linked list that we can reuse, but that is trivial enough and > we've re-implemented it wherever needed (with the minimum set of > operations needed for each tool). I can send an implementation of > a hash table that I've used for my own programs, MIT/X licensed and it is > simple enough. > > Regarding UTF-8, some other programs in sbase also lack proper handling > of UTF-8. Do you think we could embed libutf8 from suckless.org and > use it? I think having Unicode support is necessary at least in the long run and UTF-8 is the way to go. libutf provides the most basic handling of UTF-8 but should be sufficient as long as you do not want to go into text normalization too much [1] [2]. BTW, the most recently updated version of the library seems to be at https://github.com/cls/libutf/commits/master and not at http://git.suckless.org/libutf/ for some reason. [1] http://blog.golang.org/normalization [2] http://mortoray.com/2013/11/27/the-string-type-is-broken/ > > +usage(void) > > +{ > > + eprintf("usage: tr set1 [set2]\n"); > > +} > > Use %s and argv0. I changed it in the new version of the patch that I will send out when we have decided the Unicode issue. > > +void > > +handle_escapes(char *s) > > +{ > > + switch(*s) { > > + case 'n': > > + *s = '\x0A'; > > + break; > > + case 't': > > + *s = '\x09'; > > + break; > > + case '\\': > > + *s = '\x5c'; > > + break; > > + } > > +} > > I have not yet applied this patch but I suspect you have > mixed whitespace + tabs here. Use tabs only. You were right. I changed the whitespace to be tabs only. > > + if (ferror(stdin)) { > > + eprintf("<stdin>: read error:"); > > + return EXIT_FAILURE; > > + } > > Indentation issues. Corrected. Cheers, Silvan