Re: [bug #46007] texi2dvi Msys support

Vincent Belaïche Thu, 25 Feb 2016 01:16:10 -0800

(Not Texinfo related: please ignore if you're not interested.)

Hello Gavin,


In reply of:

http://lists.gnu.org/archive/html/bug-texinfo/2015-09/msg00115.html

I has been a long time since I should have answered about what evolution
I was thinking of concerning interacting with external commands...

Well there are two points about hooking how the shell interacts with
external commands:

- environment
- command arguments

On the second point I think that, at least in bash, there is already
some provision for making such user hooks. Imagine you have some
command foo.exe, and you want a hook to prefix the 1st argument by a +
sign before calling the command, you can still write in your
.bash_profile (not tested):

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
function foo(){
     arg1=$1
     shift
     command foo.exe "+$arg1" "$@"
}
export -f foo
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Well `command' is a bash builtin (cf. info node "(bash) Bash Builtins"),
I don't know whether sh has the same thing...

However, even though such provision exists, it is not sufficient to make
generic user hooks that:
- would be called instead of a command, whether this command is an
  executable, a builtin, or a script
- would be called based some some condition matching the command name

There could be if you need to hook all the commands the name of which
starts with f and does not end with d the following sort of syntax for
the hook definition:

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
hook hookname()
when [[ $0 == f* && $0 != *d  ]];
at 0;
{
     local arg1


     arg1=$1
     shift
     "$0" "+$arg1" "$@"
}
export -h hookname
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Where this syntax
--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
when [[ $0 == f* && $0 != *d  ]];
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

is the condition when hook named `hookname' is executed instead of
command.

and there is a specification for order
--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
at 0
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

meaning that hookname is the first hook that is tried for condition
(there could be other type of specification like `after somehook' or
`before somehook', or using negative index like `at -1' for created in
last position, `at -2' one position before last, etc...

the `export -h' would be to tell to export this hook to child process
(if another bash script is subshelled).

Command hooks would be examined immediately after the shell has prepared
the command line (path to command and arguments). That may be before the
fork used for creating the child process in which the command is run.

Now this was the easy part of it, concerning the first point,
environment, ie the environment translation --- and just as a reminder
this is where texi2dvi script has some limitation when running over MSYS
--- this is more thorny.  You noted in your latest email:

> I understand when a shell launches a process, it forks (creating a
> copy of the shell process), sets up the environment for the process
> (for example environment variables and file descriptors), and then
> uses the "exec" call to replace itself with the program being
> launched. What would be interesting would be if there was a way to
> intervene after the fork, but before the exec.

That was not really the idea I had in mind. What you were considering is
some way to translate the environement from MSYS format to the MSW
native format when native commands are invoked. Instead my idea was that
the environment would be « unchanged » by MSYS, *BUT* MSYS bash scripts
and Msys application would access it through translator objects.

Let us consider some fancy silly example for the sake of
explanation. Imagine some envvar FOO, the value of which is "bar" in
native format, but it MSYS format I need to suffix a "t" to all values
so the value in MSYS format has to be "bart". Syntactically I would add
to my MSYS .profile or .bash_profile the following statements

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
# Here we declare a class to translate access to variable FOO
envtranslation FOO_TRANSLATE
{
   # declare member variables. -m is a new option for declare.
   declare -m native # previous value of $this
   declare -m cache  # cached translation

   # read the envvar (accessed in $this special member variable of this
   # object) with translating it ("t" appending). A cache technique is
   # used to make the translation only when a $this has a new value
   function get()
   {
        if [ "$native" != "$this" ]; then
          # do the translation
          native=$this
          cache=${native}t
        fi
        # $cache is the got value, `got' is a novel keyword
        got $cache
   }

   # set the envar to a new value. $1 is the new MSYS value.
   function set()
   {
       # here we need to remove the trailing "t"
       this=${1:0:-1}
       native=$this
       cache=$1
   }
}

# now tell MSYS that from now on FOO has to be converted via the
# FOO_TRANSLATE class. -c is a new option for declare

declare -c FOO_TRANSLATE FOO 
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----


We also have a default translation class that is the class of of envvars
that have not been declared with `declare -c ....', like this:

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
envtranslation default
{
   function get()
   {
        got $this
   }

   function set()
   {
        this=$1
   }
}
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

The default class is useful if I want to stop translating some envvar.


So what will happen, assume that the current native value of FOO is
"bar", and we have three cases:

- access by the shell script
- access by a native command
- access by an MSYS command

access by the shell script
~~~~~~~~~~~~~~~~~~~~~~~~~~

In case FOO is read in the shell script itself, and after the `declare
 -c FOO_TRANSLATE FOO' statement, for instance by this statement

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
BART=$FOO
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

then the `get' method of the FOO_TRANSLATION class is called by bash,
then variable BART gets value "bart" instead of "bar".

Similarly, the following statements in a script
--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
FOO="guillemot"
native-command.exe
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Will set "guillemo" into the envar "FOO", as the `set' method is called
the last char is removed. So, the native command native-command.exe
called in the sequel will get "guillemo" if it calls `getenv("FOO")'.

access by native commands
~~~~~~~~~~~~~~~~~~~~~~~~~
See example above, native command get the values in native format,
because any assignment in the shell calls the `set' method which does
the translation from MSYS format to native format.

access by MSYS commands
~~~~~~~~~~~~~~~~~~~~~~~
The idea is that the following data is inherited by subshell calling
command, similar to inheriting the environment:

- the environment translation class (ETC)  definition byte code
- mapping of ETC to envvar
- for each envvar ETC objet, member attribute: e.g. attributes `cache'
  and `native' in the case of FOO_TRANSLATION would have to be inherited
  for each envvar which, like FOO, has been declared of FOO_TRANSLATION
  ETC.

Now, when an MSys command is called any invocate of getenv or setenv (ie
in the subshell after the exec is called) will check the ETC mapping,
and if the ETC is different from `default', will run the bytecode
interpreter for executing `get' or `set' respectively.

The problem with the above idea is that you have a major backward
compatibility issue in that you have to recompile all the Msys command
with the new getenv & setenv implementation. In your idea (conversion
done somewhere between fork and exec) that would not be the case, but
when there is a large amount of envvars you need to make all the
translation every time even for all these variables which the native
command does not need.

Another issue with your method is that anyway bash would need to know
whether a command is an MSys command or not. I don't know how it works
currently, I suspect that in the process of subshelling a command there
is already some way to detect whether it is Msys command or
not. Otherwise, your method would suffer the same backward compatibility
issue.

VBR,
        Vincent.

Re: [bug #46007] texi2dvi Msys support

Reply via email to