And a note to Adam, we've hashed this patch out offlist, but if you have
any critiques on this, please fire away. It's just a few lines and
straightforward, but as a patch-submission newbie I can use the
multifaceted scrutiny, and would like to know what you think.
Thanks
Kevin
On Mon, 24 Aug 2015, Karl Dahlke wrote:
We stand on the edge of pushing a change that will require tidy5.
It's cautious, doesn't do anything except run the html through tidy,
in parallel with everything else we are doing,
then free the tidy tree when the window is freed.
Just to get us started, to make sure tidy doesn't seg fault etc.
But it will change the way edbrowse is built.
We now need another library etc.
Should we, and I kinda think we should, stamp another version, 3.5.4.2,
before we jump into the tidy pool?
Some work has been done since 3.5.4.1, some bug fixes, some cosmetics,
and the framework for imap, including a simple move delete interface.
Chris before you push Kevin's tidy patch, maybe stamp 3.5.4.2.
After we are using tidy to parse html,
and I hope this isn't a long time coming, we may want to jump up to 3.6.
Karl Dahlke
_______________________________________________
Edbrowse-dev mailing list
[email protected]
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists
diff -Naur 1/edbrowse-master/README 2/edbrowse-master/README
--- 1/edbrowse-master/README 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/README 2015-08-23 21:46:42.783741131 -0700
@@ -73,6 +73,19 @@
If you have to compile curl from source, be sure to specify
--ENABLE-VERSION-SYMBOLS (or some such) at the configure script.
+Edbrowse now uses the Tidy HTML parser. So there are a couple
+of things to install for this prerequisite.
+The Tidy compilation process uses cmake. Please either use your
+package manager to get cmake (for instance, apt-get install cmake),
+or follow the instructions at http://www.cmake.org/download/
+
+Once you have cmake, download the latest Tidy code from:
+https://github.com/htacg/tidy-html5/archive/master.zip
+Unzip and cd to build/cmake
+cmake ../..
+make install
+Now the latest Tidy library will be available to edbrowse.
+
Finally, you need the Spider Monkey javascript engine from Mozilla.org
ftp://ftp.mozilla.org/pub/mozilla.org/js/
Edbrowse 3.5.1 and higher requires Mozilla js version 2.4.
diff -Naur 1/edbrowse-master/src/buffers.c 2/edbrowse-master/src/buffers.c
--- 1/edbrowse-master/src/buffers.c 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/buffers.c 2015-08-24 16:02:18.351550150 -0700
@@ -583,6 +583,7 @@
nzFree(w->firstURL);
nzFree(w->referrer);
nzFree(w->baseDirName);
+ tidyRelease(w->tdoc);
free(w);
} /* freeWindow */
diff -Naur 1/edbrowse-master/src/eb.h 2/edbrowse-master/src/eb.h
--- 1/edbrowse-master/src/eb.h 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/eb.h 2015-08-23 21:34:37.165011656 -0700
@@ -26,6 +26,7 @@
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
+#include <tidy.h>
#include <curl/curl.h>
#ifdef DOSLIKE
#include <io.h>
@@ -362,6 +363,7 @@
jsobjtype jcx;
jsobjtype winobj;
jsobjtype docobj; /* window.document */
+ TidyDoc tdoc; /* tidy5 html parser */
struct DBTABLE *table; /* if in sqlMode */
};
extern struct ebWindow *cw; /* current window */
diff -Naur 1/edbrowse-master/src/html.c 2/edbrowse-master/src/html.c
--- 1/edbrowse-master/src/html.c 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/html.c 2015-08-24 16:17:20.031306748 -0700
@@ -1668,6 +1668,21 @@
int nopt; /* number of options */
int intable = 0, inrow = 0;
bool tdfirst;
+ int TidyReturnValue; /* for Tidy methods that return
+success/failure */
+
+ // Tidy-related actions on incoming html
+
+ // At the moment, the goal is to get the parser into edbrowse
+ // and be able to call things without detrimental effect to
+ // any existing functionality
+
+ cw->tdoc = tidyCreate();
+printf("In case you wanted to know if this is the version with Tidy, it is");
+ // run tidyParseString here, or do something else
+ //TidyReturnValue = tidyParseString (tdoc,html);
+
+ // The use of Tidy ends here ---
ns = initString(&ns_l);
preamble = initString(&preamble_l);
diff -Naur 1/edbrowse-master/src/makefile 2/edbrowse-master/src/makefile
--- 1/edbrowse-master/src/makefile 2015-08-23 21:32:17.459104575 -0700
+++ 2/edbrowse-master/src/makefile 2015-08-24 16:45:40.857553878 -0700
@@ -32,7 +32,7 @@
# Override JSLIB on the command-line, if your distro uses a different name.
# E.G., make JSLIB=-lmozjs
JSLIB = -lmozjs-24
-LDLIBS = -lpcre -lcurl -lreadline -lncurses
+LDLIBS = -lpcre -lcurl -lreadline -lncurses -ltidy
# Make the dynamically linked executable program by default.
all: edbrowse edbrowse-js
_______________________________________________
Edbrowse-dev mailing list
[email protected]
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev