[Long long post :-(
Summary:
 - echo/printf are defecient.
 - joining should be merged into ``echo`` with new options.  [``echo``
is a mess.]
 - splitting should be merged into ``read`` with new options.
]

Axel:
 (1) apply attached patch even if you don't have time to read the mail,
 (2) I'm ready to implement these, if there is agreement on the spec.

On Tue, Jun 16, 2009 at 05:38, Isaac
Dupree<[email protected]> wrote:
>> printf is quite useful with fish arrays.
>> But very frequently I would like to have a "join SEP WORDS" command that
>> places N-1 separators between N words.  And "split SEP STRING" while at that.
>
> That's an easy function/command to implement, if anyone has a thought
> where it should go.. hm..
>
"join" and "split" are both taken by coreutils.  "strjoin" and "strsplit"?
Now about interface: should they take arguments or stdin?
  foo | stdinjoin ,   <=>   argjoin , (foo)
  argjoin , $x   <=>   printf '%s\n' $x | stdinjoin ,
OK, arguments clearly win.

Now this mail grew in several directions, so I'll try to organize it
with sections...

Sidenote: printf bad
----------------------------

BTW, the ``printf "%s\n" $x`` form doesn't handle zero arguments
(printf assumes missing %s arguments to be empty strings):

b...@shiny ~/W/cben-hacks> for n in (seq 0 2)
                               echo == $n ==
                               set x (seq $n)
                               printf "%s\n" $x
                           end
== 0 ==

== 1 ==
1
== 2 ==
1
2

Fish's builtin ``PREFIX$xSUFFIX`` syntax works well, but echo
introduces unwanted spaces:

b...@shiny ~/W/cben-hacks> for n in (seq 0 2)
                               echo == $n ==
                               set x (seq $n)
                               echo -n $x\n
                           end
== 0 ==
== 1 ==
1
== 2 ==
1
 2

So we have to combine them (``printf "%s"`` is a special case where
the 0 args = 1 arg feature/bug doesn't matter):

b...@shiny ~/W/cben-hacks> for n in (seq 0 2)
                               echo == $n ==
                               set x (seq $n)
                               printf "%s" $x\n
                           end
== 0 ==
== 1 ==
1
== 2 ==
1
2

(``strjoin "" $x\n`` would work just as well.)

Conclusions: join = echo?
------------------------------------

Wait, ``echo`` already does "print args with a separator", it just
hard-codes the separator to be a space.
So let's just augment ``echo`` to be the ultimate argument printer:
    -s, --separator=<SEP>    String to print between arguments [default: " "]
    -N, --end=<END>          String to print after arguments [default: \n]
(Note that short options cannot allow both "-s<SEP>" and "-s <SEP>" forms.
That's ambiguous because <SEP> and <END> may be empty strings.
I don't think we can get away without short option forms altogether in ``echo``.
So one of the two forms must be chosen, and the docs must be clear on it.)

For compatibility, it should also take:
    -n, --no-end             Equivallent to --end=""
    -e                       Expand escape sequences
    -E                       Ignored
but -e is be included for  be included but are probably reduntant in fish.

Oh, and turns out POSIX echo and GNU echo don't recognize ``--``.
They just print it.
So there is no way whatsoever to print just ``-n`` or ``-e``!  Ugly!
We must recognize ``--``.  Of course nobody would bother to type
``echo -- $x`` all the time, so in practive all constructions
involving ``echo`` will not be minus-clean...
(It's like bash supported ``"$...@]"`` but nobody used it much until
fish came and made it the default meaning of ``$x``.)
But it's a general UNIX problem, without an easy solution, and
tackling it specifically in ``echo`` won't help.
If your strings start with a minus, you won't get far with them in a
shell script anyway...

BTW, until now fish relied on /bin/echo, now it needs a builtin "echo" command.
And we can't push these features into coreutils - /bin/echo must
remain POSIX-compatible.

Parsing string into an array
--------------------------------------

Now, for something completely different.  Rememeber split?
- I want to have ``(split : foo:bar)`` to return 'foo' 'bar'.
- Maybe I sometimes want to split by regexp.
- There is another parsing operator that I sorely miss:
  match wildcard and extract the "stem" (in Makefile terminology).
  Arbitrary syntax mockup:
     for 'foo{$var}.py' in foo*.py; echo $var; end
- Sometimes I'd want to destructure several vars:
     for 'foo{$var1}_{$var2}.py' in foo*.py; echo $var2 $var1; end
- And from here, it's a short way to want destructuring by regexps.

Let's see how far "sed" takes us:
  echo foo:bar | sed 's/:/\n/g'
  echo foo:::bar | sed 's/:*/\n/g'
  for var in (ls foo*py | sed -n 's/^foo(.*)\.py$/\1/p')
yikes...
=> OK, no regexps.
The whole point of a simple primitive is that it should be trivial to use.

Wait, ``read`` already does "split into variables".
It just hard-codes the separator to be whitespace (*).
And it doesn't do "split into an array" like bash's ``read -a`` can.

So let's just augment ``read`` to be the ultimate argument splitter:
    -N, --end=<END>          Read until this string [default: \n]
                             In interactive mode, always read until Enter
                             is pressed; user can type \n with Alt+Enter.
                             If <END> is empty string, read to end of file.
    -n, --to-end-of-file     Read to end of file.
    -s, --separator=<SEP>    String by which to split into elements.
                             If <SEP> is empty string, split into
                             individual characters.
                             [default: split on any amount of whitespace,
                             ignore leading and trailing separators.]
    -a, --array              Split on all occurences of separator,
                             make last variable an array.
    -e, --exact              Split into exactly one word per variable,
                             last variable gets the remainder as one string.
In any case, when the input contains less fields than variables given,
the extra variables should be set to a 0-length array.

(The observant reader will note that I tried to make it the exact
inverse of ``echo`` with same options.)

Open questions:
 * Should --array or --exact be the default?
   --array seems the right thing for fish, but would break backward
compatibility.
 * Should there be an explicit way to specify the magic default
any-whitespace splitting?
   * Bash does this if IFS is unset or exactly " \t\n".
   * awk and perl do this if pattern is " ".
   * Python does this if pattern is None.
   No fixed string can work here.  So this has to be a separate ``-w,
--whitespace`` option.
   But if it's a different option, and it's the default, why do we
need the option?
 * Should there be a way to separately specify aspects of the magic
whitespace splitting?
   * Not a fixed string, any whitespace counts?
     Easily done with ``tr " \t\n" " " | read``.
   * Adjacent sepators count as one?
     Easily done with ``tr -s "<SEP>" | read`` (harder with multichar <SEP>).
   * Leading and trailing separators are ignored?
     No command springs to mind, but do you ever need it except with whitespace?
   I think these would be creaping featurism.
   But the combination of all three is so standard in Unix, that it
must be provided.

(*) Actually read uses $IFS as separator, but that's unfishy and ugly.
    An envvar is inconvenient to localize, and changing IFS globally is evil.
    An option to read is much better and more discoverable.

    There was one other use of IFS in fish's code:
    exec.c has code to split command substitution on $IFS[0],
    but it's commented out, fixed to '\n', and apparently was never active.
    Anyway, that too would be wrong - how do you set IFS for a single
    command substitution?!?  Better use ``tr`` or ``read -s``.

Oh, and the current ``read`` doc is just wrong.  Patch attached.

-- 
Beni <[email protected]>
Trust me, I know what I'm doing.  I'm just fuzzy on the why ;-)
diff -rN -u old-fish/doc_src/read.txt new-fish/doc_src/read.txt
--- old-fish/doc_src/read.txt	2009-06-18 18:02:26.000000000 +0300
+++ new-fish/doc_src/read.txt	2009-06-18 18:02:26.000000000 +0300
@@ -1,31 +1,32 @@
 \section read read - read line of input into variables
 
 \subsection read-synopsis Synopsis
-<tt>read [OPTIONS] [VARIABLES...]</tt>
+<code>read [OPTIONS] [VARIABLES...]</code>
 
 \subsection read-description Description
 
-The <tt>read</tt> builtin causes fish to read one line from standard
+The <code>read</code> builtin causes fish to read one line from standard
 input and store the result in one or more environment variables. 
 
-- <tt>-c CMD</tt> or <tt>--command=CMD</tt> specifies that the initial string in the interactive mode command buffer should be CMD.
-- <tt>-e</tt> or <tt>--export</tt> specifies that the variables will be exported to subshells.
-- <tt>-g</tt> or <tt>--global</tt> specifies that the variables will be made global.
-- <tt>-m NAME</tt> or <tt>--mode-name=NAME</tt> specifies that the name NAME should be used to save/load the history file. If NAME is fish, the regular fish history will be available. 
-- <tt>-p PROMPT_CMD</tt> or <tt>--prompt=PROMPT_CMD</tt> specifies that the output of the shell command PROMPT_CMD should be used as the prompt for the interactive mode prompt. The default prompt command is <tt>set_color green; echo read; set_color normal; echo "> "</tt>.
-- <code>-s</code> or <code>--shell</code> Use syntax highlighting, tab completions and command termination suitable for entering shellscript code
-- <code>-u</code> or <code>--unexport</code> causes the specified environment not to be exported to child processes
+- <code>-c CMD</code> or <code>--command=CMD</code> specifies that the initial string in the interactive mode command buffer should be CMD.
+- <code>-g</code> or <code>--global</code> specifies that the variables will be made global.
+- <code>-l</code> or <code>--local</code> specifies that the variables will be made local.
+- <code>-m NAME</code> or <code>--mode-name=NAME</code> specifies that the name NAME should be used to save/load the history file. If NAME is fish, the regular fish history will be available.
+- <code>-p PROMPT_CMD</code> or <code>--prompt=PROMPT_CMD</code> specifies that the output of the shell command PROMPT_CMD should be used as the prompt for the interactive mode prompt. The default prompt command is <code>set_color green; echo read; set_color normal; echo "> "</code>.
+- <code>-s</code> or <code>--shell</code> Use syntax highlighting, tab completions and command termination suitable for entering shellscript code.
+- <code>-u</code> or <code>--unexport</code> causes the specified environment not to be exported to child processes.
 - <code>-U</code> or <code>--universal</code> causes the specified environment variable to be made universal. If this option is supplied, the variable will be shared between all the current users fish instances on the current computer, and will be preserved across restarts of the shell.
-- <code>-x</code> or <code>--export</code> causes the specified environment variable to be exported to child processes
+- <code>-x</code> or <code>--export</code> causes the specified environment variable to be exported to child processes.
 
 Read starts by reading a single line of input from stdin, the line is
-then tokenized using the <tt>IFS</tt> environment variable. Each variable
-specified in <tt>VARIABLES</tt> is then assigned one tokenized string
+then tokenized using the <code>IFS</code> environment variable. Each variable
+specified in <code>VARIABLES</code> is then assigned one tokenized string
 element. If there are more tokens than variables, the complete
-remainder is assigned to the last variable.
+remainder is assigned to the last variable (as one string, not array).
 
 \subsection read-example Example
 
-<tt>echo hello|read foo</tt>
+<code>echo hello nice world | read foo bar</code>
+
+Will cause the variable \$foo to be assigned the value <code>hello</code> and \$bar to be assigned the value <code>nice world</code>.
 
-Will cause the variable \$foo to be assigned the value hello.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Fish-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fish-users

Reply via email to