JS needs certain corresponding API implemented in Arachne

Mithgol the Webmaster Sat, 02 Feb 2002 21:52:31 -0800

On Thu, 31 Jan 2002 04:26:53 -0400, Clarence Verge wrote:
> On Wed, 30 Jan 2002 20:41:52 +0300, Mithgol the Webmaster wrote:
> > On Mon, 28 Jan 2002 15:58:19 -0400, Clarence Verge <[EMAIL PROTECTED]> wrote:
>
> >> Perhaps you would like to contribute in the form of a minimal
> >> js masquerade engine ?
>
> > Sounds good. Let's discuss what should it be. I am busy right now, but
> > in early February I'll think about making a standalone DOS16 executable
> > (.EXE) JavaScript preparser (with open source code Turbo Pascal 5.5/6.0
> > .PAS) - a program which parses original HTML and makes one of the
> > following:
> > 1) .ASF with a single URL where Arachne should go to
> > 2) .HTM, free of JavaScript, which Arachne should browse
>
> > but this tends to be not so fast solution, since it needs an external APM
> > helper instead of plug-in engine.
>
> They end up being the same thing. Whether the external is faster or
> slower will depend on the type of code.


No. To execute a standalone .EXE, Arachne needs a task-swapping, and also a
temporarily file written. That's the case why GIF is faster than JPEG in
Arachne - GIFs are decoded and viewed internally by core, JPEGs need an
external helper application which writes a temporarily true color BMP.

Now it's an early February, right? Okay. I've already thought of making a
standalone executable file which would do the job. And I came to some
conclusion... such an executable will slow the things greatly, and will be
completely useless to develop any further. It won't ever be a solution which I
think we need: a JavaScript masquerade engine which can easily be developed
into a real JavaScript 1.1 support.

So, JS engine COMPLETELY independent from browser is UNREAL.

It seems not so easy as it seemed. Imagine a JavaScript that modifies
some hidden content in HTML form (BTW, I wrote a complete working example on
Wed, 30 Jan 2002 20:41:52 +0300, and posted it to the list). To write a
standalone solution, I should make my own (browser-independent) HTML parser,
and then substitute a hidden value and write another temporarily HTML - say
another, because there is already temp JS file written by Arachne.

So, I need some API (application program interface): 
an agreement on how my JS engine may pass some data to Arachne
or how JS may redirect the whole browser to a new URL where "protected" page is
located (a usual task of maquerade solution).

And we must prepare to the worse: the SCRIPT tag has SRC attribute.
The whole HTML code can be located in some other (external) file
on the remote server; probably not even on the same server.

Not to write another DOS TCP/IP stack, I also need an API to tell
Arachne that there is another .JS file we need in cache, and not
only the JS file taken from HTML already downloaded by Arachne itself, but a
"real" JS downloaded from remote server given by JS engine.

There are even worst tricks. Rotterdam at www.rotterdam.nl,
provided by Bastiaan Edelman, proves that some JavaScript
do not modify the already existing content in HTML forms;
instead, they write HTML theirselves. Rendering the page
on a user screen, Arachne MUST replace the script with
HTML code given by the script.

There are also JavaScript variables, which must be shared between
all scripts in the page - and they can also be used as arguments
to functions. At www.rotterdam.nl, we have javascriptVersion,
requiredVersion, useRedirect, flashPage, noFlashPage, upgradePage,
isIE, isWin, etc. These variables are NECESSARY for masquerade,
because they may contain parts of URL where Arachne is going to be
redirected to. URLs of "MSIE-only pages", I mean.

Though www.rotterdam.nl itself is useless to build a solution for,
since we have no Flash APM for Arachne (and it is not going to happen), the
example is still very important. Lots of what you call "dirty tricks" are there.

BTW, I need more examples. Being just a student in Taganrog State University of
Radioelectronics, I lack the practice in webmastering yet. Theoretically I know
the whole JavaScript (except MSIE extentions, maybe) and I can invent lots of
methods of keeping Arachne outdoors on some site; but I guess we talk about
what _other_ webmasters do.

> > But it will do masqerade, since most guardian
> > JS will either send a browser to another MSIE-only page via JS or change a
> > hidden value in a certain form.
>
> Excellent !!

There was nothing new in my sentence. I can't even imagine those guardian
scripts doing something else ;-)

Remember, browser detection is either client-side or server-side, there's no
third way to go. When it is server-side, some value is sent to the server, and
it should be hidden or users theirselves will play with it ;-) When it is
client-side, the browser should make a conclusion itself about "where we do
want to go today", right? This means an alternate URL. Or this means the whole
page rewritten on client-side by HTML output, but user may see JavaScript
output in HTML source, even if it not really engaged. So, an alternate URL is
more secure.

> > But. You should give me several examples of websites where Arachne should
> > masquerade; the browser detection methods DO vary. I'll invent the most
> > common and quick masquerade solution for each.
>
> Hmmm. If i were to guess, I think those specific examples will come in
> the form of complaints after the initial release. <g>

I still need them, not to miss something important. They help greatly.

Having meditated over Rotterdam website, I found that any useful
masquerade needs data exchange between browser (Arachne) and JS engine (my
program). So the program tends to be not so standalone. It will be another .EXE
file, like Insight/WWWMan/DJPEG/EPPPD/APM, but it needs some support from
browser.

I have a brilliant idea. JavaScript can be parsed once, exactly as JPEG is
converted by Arachne, and then stored in cache with downloaded HTML. Then, only
the result will be interpreted by the browser when the cache files are read.

The following scenario seems the most real one.



==      JAVA SCRIPT RENDERING IN ARACHNE       ==
== DATA EXCHANGE BETWEEN BROWSER AND JS ENGINE ==

         ===== SUGGESTED VERSION =====

1) Having downloaded the page, Arachne renders it as if it were no JS. Then
Arachne downloads the whole pack of embedded objects: images, sounds, etc.
External JavaScripts should also be downloaded here, and stored in cache.
Embedded JavaScripts should be cut out
from HTML, and stored in files, each JS in its own file,
though the place should be remembered by Arachne where JS were,
so written JS output will be placed there.

2) Having rendered the page, Arachne creates a global context file,
where the whole bunch of DOM objects is stored. This includes
an array of forms (with all elements), an array of images,
and an array of most recently used history addresses (MRU URL,
necessary for rendering JS document.history object hierarchy).
And an array of frames, their names and URL of files loaded.
This file will also be further appended by JS engine then,
to store global variables and other stuff there (see below).

IMPORTANT. The global context file should also contain
several strings necessary to render "Navigator" object,
the most common way for browser detection. These strings are

*) application code name (I haven't yet tested it)
*) application name (e.g. "Arachne", "Netscape",
   "Microsoft Internet Explorer")
*) application version (e.g. "4.0 (compatible; MSIE 4.0; Windows 95)", "4.05
[en] (Win95; I)")
*) user agent string (as sent in HTTP request)

I suggest these strings to be configurable via ARACHNE.CFG.
User-configurable. I insist.

3) Arachne displays a string in its status line, "Parsing
JavaScript 1 of 13...", and a green progress bar. Arachne
calls an external JS engine, as if it were DGI. JS filename
and filename of global context file are given as parameters
in command line.

4) JS engine output is placed in HTML, where JS itself was.
It replaces the whole <script> ... </script> sequence.
The page is re-rendered by Arachne, a green progress bar
is now 2 of 13, and the cycle is repeated for each JS.
It is important that JS engine is always provided with
the same context file and a different JS, consequently.
The context file is modified by JS engine, where necessary.
Global variables are stored here. Function names, with
corresponding JS filenames, are stored; this will be
necessary if some script calls a function defined in
another script. The DOM objects may be modified with
JavaScript.

IMPORTANT. Sometimes JS output contains HTML forms and
images, which cause further changes in DOM data. Arachne
should modify the global context file accordingly.

5) After JS parsing, Arachne uses DOM content stored
in global context file. If top.window.location property
is changed, Arachne flies to another webpage. If some
frame is reloaded, Arachne downloads and renders it.
Remember that any frame may call JavaScript functions
located in some other child frame, sibling frame,
or parent frame. Global context file name _MUST_ remain
unaltered while top URL is not changed; however,
reloading a frame should destroy any DOM objects
defined by HTML in the frame, and all global variables
declared by scripts located in that frame.

IMPORTANT. Having loaded another document to a frame,
Arachne MUST change the global context file, providing
new DOM for forms, anchors, and images of the frame.
If the new document contain more JS, then JS engine
should start for those scripts, from step 1.

IMPORTANT. Parsing JavaScripts located in frame,
JS engine should be provided not only with JS filename
and global context filename, but also with frame name.
Though the name of current (last loaded) frame name
can be simply stored in global context file, this solution
is inadequate, since it consumes some time to clean the name
after JS is parsed, making Arachne too slow.

6) If steps 1-5 are fulfilled completely and no subsequent
downloading occurs, Arachne uses DOM stored in global context
file. According to GCF data, Arachne fills all forms with
new values. If new image.src properties are given, Arachne
downloads new images from remote server(s) and replaces
the old ones, though width and height remain unaltered. If new
window.status property is given for top frame, Arachne replaces
default status line (ARACHNE.CFG should define whether JS status
overrides ARACHNE MSG="..." also). And so on, DOM is rendered.

IMPORTANT NOTE FOR STEPS 5 AND 6. To speed up DOM rendering
in Arachne, the JS engine itself should put notes into
global context file, enumerating the changes in DOM data.
Arachne, being notified, should re-render only the changes
written by JS engine. JS engine may also distribute these
changes between two independent categories:
*) changes in document URL (step 5 required)
*) changes in document data (step 5 is not required).

Having Arachne upgraded and JS-required changes already
implemented, JS engine may be easily turned off by calling,
instead of it, another EXE file which does nothing. So,
there will be no "change notes" in global context file
and thus no JS output, so Arachne will understand that
the browser also has nothing to do.

7) If BODY tag contains JavaScript onLoad event defined (e.g.
<BODY onLoad="functionCall();">), then the corresponding
JS function is called via JS engine. Then steps 5-6 are repeated.

This step is optional. May be implemented later. However, some
guardian scripts are called after document is loaded; so, step 7
is required for successful masquerade.

8) If some buttons, or images, contain JavaScript onClick event
defined, then the then the corresponding JS function is called
via JS engine. Then steps 5-6 (and not the 7th step!) are repeated.

This step is optional. May be implemented later. However, some
really mean and dirty guardian scripts are called when you have
already filled some form (spent your time) and have just pressed
the final button. So, step 8 is also required
for successful masquerade.

IMPORTANT NOTE FOR STEPS 7 AND 8. Since function names are already
written in global context file, Arachne itself may check whether
defined events are correct or not.

IMPORTANT NOTE FOR STEPS 5, 6, 7 AND 8. If JS engine finds a JS bug,
it should report it to Arachne via global context file. It is up to
Arachne, what to do then. In Netscape/MSIE, the browser behaviour
in case of JavaScript error is usually user-configurable, with
several options:

*) Ignore this JavaScript, do not inform user about the error
*) Ignore this JavaScript and all JS on the page, do not inform user
*) Ignore this JavaScript and all JS on the site, do not inform user
*) Ignore this JavaScript and ask user what to do next

Ignoring current JS is a must, nearly always.

      ===== END OF SUGGESTED VERSION =====


This suggestion is a real way to make an external JS engine solution;
though it is not utter, it contains nearly all things requred
for masquerade. However, it won't work without some support
implemented in Arachne; I mean, without global context file
and downloading of external scripts. The format of global syntax file
is to be discussed; I suggest it to be text-oriented, like ARACHNE.CFG, rather
than binary-oriented, like ARACHNE.PCK or CACHE.IDX. Once created by JS engine,
GSF can be stored in cache and linked to URL, so Arachne will only reload a
complete DOM from GSF, instead of cycling through the above mentioned steps.

However, the above suggested solution does not include image rollover
(this require Arachne not to scroll page automatically to the top
after an external EXE call). It also does not implement delayed script
execution (so called JavaScript timeouts), so one may not expect seeing clock
in a form field, appearing/scrolling strings in field or even in Arachne status
line, flashing background colors, etc.

Implementing rollovers and other on-fly changes in the document will
require the JS engine to be small enough to fit in memory without
unloading Arachne. Otherwise, it will be a really mad solution -
to re-render the whole document when a single image is to be redrawn.

The suggested solution may easily be upgraded furthermore, resulting
in a most modern JavaScript implemented e.g. in MSIE. However,
this will require an appropriate DOM in Arachne. Still there are some
bugs in Arachne DOM, http://mithgol.pp.ru/Master/Bug-list/index.htm#CSS
is where corresponding bugs in CSS are mentioned. JavaScripts and
Cancading Style Sheets usually share the same Document Object Model,
so bugs in DOM affect both CSS and JS implementations.

Enough for today. I'm still on in the fourth year of a course of studies,
and have to pass my course homework on operating systems. I'll write
a paper concerning Apache under Win32 - installing it, configuring,
using, etc. A relatively simple for me, being fond of webmastering.

Writing a JS engine is much more complicated, since JavaScript syntax
is similar to C++, and it is not easy to write a C++ interpreter.
Fortunately, JS has no pointers, and also an automatic typecast.

Anyway, first of all I need somebody responsible to us for upgrading
Arachne. I want to know whether it is possible to write the above
described API between Arachne browser and JS engine, and we should
discuss the global context file format also. Otherwise I won't even
raise a finger. The amount of GCF DOM will determine what exactly
JavaScript may alter in Arachne pages and behaviour, and what it
may not alter (may not even read).

IMPORTANT. The DOM abbreviation for Document Object Model is
relatively well-known, it's a W3C invention AFAIK. But GCF,
for "global context file", is my own invention, to be used
in Arachne JavaScript engine.

IMPORTANT. I may invent several completely different JS guardian
scripts which can break even the above described masquerade solution.
Using yet unimplemented features of JS engine (e.g. delayed execution), or
unimplemented features of Arachne itself (e.g. Flash, layers, absolute
positioning, z-order, CSS details, etc.) It's an arms race. Like Cold War, you
know. We former Soviets are often skilled in such things - our air defence is
still the best, for example, and SPb is more protected than NYC. Unfortunately,
we have our own terrorists much closer. The world has something mad always :-((

Consider the above paragraph as apologies. "I'm not a wizard, I'm
just studying yet" - became a proverbial saying here in Russia.
I can't make miracles. I'll be quite happy to create a shield
to protect from the most common dirty tricks.

IMPORTANT. I prefer writing DOS16 programs using Turbo Pascal,
instead of C++. The resulting executable may be not so easy
to port under Linux or Flowerpot or whatelse. But, you know,
most TP programs can easily be translated to C++:
*) begin ... end translated to {}
*) case translated to switch
*) repeat ... until ... translated to do { ... } while (!...)
*) {} translated to /*  */
*) object translated to class
*) ^. translated to ->
*) nil translated to null
*) procedures translated to functions with null arguments,
   adding () after each procedure call
   
... and so on. Pascal and C++ are comparable, really, it's not
like PERL or BASIC or LISP or PROLOG, with other rules and/or
data structures.

Good luck!

Deeply yours,

 M   M
 MM MM
 M M M  I  T  H  G  O  L
 M   M                            http://mithgol.pp.ru/
            T  H  E                  [EMAIL PROTECTED]
W     W
W  W  W   E  B  M  A  S  T  E  R
 W W W
  W W

JS needs certain corresponding API implemented in Arachne

Reply via email to