Merry Christmas!

  Here is a surprise -- that in the following you'll find that there
are so many aspects that "BoltWire can be improved".

  And a gift -- I renewed and reorganized the previous issue list and
this time most of them come up with a solution.

  I set up a site "BoltWire 中文支援站" (BoltWire Chinese Supporters) at
http://boltwire-zh.22web.net/ , which collected several data and
information for Chinese users. Welcome here.

# Issues with solution

  For a thorough confirmation and solution, I tapped into the core and
tried to fixed several confronted problems. The modified patch for
v3.3.2 is named v3.3.2f, with several issues fixed and tested, at
least on my personal site. It can be download in the "Files" of this
Google Group, and you can use a software like WinMerge to scan the
differences. Feel free to merge the fixations you want into your new
version. I don't actually know whether I broke the rule of "BoltWire
may not be sold or redistributed in any form" in the liscense. Anyway,
if it's considered offensive please tell me. And note: this ought to
be a version for development purpose, if you use it on your own
without fully understand, you are at your risk!

  The following is the beirf description of changes:

1. Added a line to the initialization routine in engine.php so that
all admins actually have 'editor' membership as official site
mentioned:

   if (strpos(",$BOLTmemberships,", ',admin,')) $BOLTmemberships .=
',editor';

2. Added an htmlspecialchars to value of comm2func in engine.php so
that searching patterns like 'aaa && bbb' via action.search work as
via function.

   changed line: $args = BOLTargs($value, "x_$field");
   to: $args = BOLTargs(htmlspecialchars($value, ENT_NOQUOTES), "x_
$field");

3. Moved the parsing of [messages] and [results] to markups.php to be
more well-organized and disallows them from being parsed in the skin
page but not in == ==. Also fixed a problem that on previewing a page
the [messages] in source are replaced by <div ....</div> but I forgot
how to fix it.

4. Besides 3, added an escape for $myquery so that search results
containing encoded UTF-8 urls are not transferred back to UTF-8,
making a mistake link like <a href="連結">連結</a>.

5. Added 2 htmlspecialchars to BOLTFsource in functions.php so that
previewing page doesn't get contents in textarea html-decoded, ie.
&lt; to <, &amp; to & or so. Also removed an issue that [(source
post)] placed out of [box] might contain non-escaped HTML tags
directly.

   origin:
   -- if (isset($args['post']) && isset($_POST[$args['post']])) return
BOLTescape($_POST[$args['post']]);
   -- if (isset($args['get']) && isset($_GET[$args['get']])) return
BOLTescape($_GET[$args['get']]);
   new:
   -- if (isset($args['post']) && isset($_POST[$args['post']])) return
BOLTescape(htmlspecialchars($_POST[$args['post']], ENT_NOQUOTES));
   -- if (isset($args['get']) && isset($_GET[$args['get']])) return
BOLTescape(htmlspecialchars($_GET[$args['get']], ENT_NOQUOTES));

6. Paragraphs with "," or "$" listed in page "site.language.xxx"
aren't translated. A more significant derivative problem is that
system messages do not display translated if they consist $1 or $2.
This was fixed by modifying BOLTtranslate in engine.php:

   line: preg_match_all('/^([a-zA-Z0-9 .&#:;]+)(?: :: (.+))/m',
$translation, $BOLTtranslationArray[$language]);
   replaced with: preg_match_all('/^(.+?)(?: :: (.+))/m',
$translation, $BOLTtranslationArray[$language]);

   I can't figure out why char listing is necessary, so I just use "."
for the freest match. It has been tested it and no adverse effect was
noted til now.

7. Fixed a wrong line in engine.php so that scripts markups.php,
functions.php, and conditions.php in $configDir can actually make
effect:

   line: $f = BOLTgetlink('', $configDir, "$script.php");
   replaced with: $f = BOLTgetlink('', "$script.php", '',
$configDir);

8. Added replaces before BOLTsavepage in BOLTFinfo in functions.php so
that saving info values no more gets all page < escaped to &lt;

   added: $content = str_replace('&lt;', '<', $content);

9. Modified a line in BOLTdomarkup so that something like `&lt; (in
file `%26lt;) no more displayed as `%26lt;

   old: $out = str_replace('&lt;', '<', $out);
   new: $out = str_replace(Array('&lt;', '%26lt;'), Array('<',
'&lt;'), $out);

10.Modified the rule of /= =/ in markup.php by removing the
unnecessary call of BOLTescape (done in BOLTcharEncode) and changed
the encode parameter to "lines" so that multi-lines are displayed
properly.

11.Modified markup.php so that uppercase HTML entities are allowed.

12.Modified markup.php so that HTML tags with post-blanks or post-
slashes like <br /> are allowed. Also fixed a problem for edit-
conflict box displaying disorder by not recognizing <br />s.

13. Modified BOLTXdelete in commands.php so that on deleting field-
modified system pages it returns to the system folder instead of being
emptied, which may corrupt the whole site if it happens on code.skin.

14.Moved site.script to code.script to match the system setting. Also
edited several places so that code* and template* are now always
showed as source in normal and preview mode, and code.embed and
code.script never are. A new config "codePagesExclude" is created with
default value "code,code.embed,code.script,template", which lists all
pages that are code* or template* while not displayed as code. Also
made a function BOLTcodePageGroup as a shortcut to do the complicate
match.

15.Changed 'code.' BOLTconfig value in BOLTsnippets, BOLTinfoVar and
BOLTMembed so that config 'codePages' applies to them. Also,
'template.' in BOLTMsession and BOLTFmail were changed to BOLTconfig
value.

16.Edited and removed few lines in BOLTexecute and BOLTfunc in
engine.php so that site.auth.functions and site.auth.commands works
and their messages are displayed appropriately. I don't actually know
why but before the modification commands auth setting didn't work.

   However, the auth determination for site.auth.functions and
site.auth.commands doesn't seem to be compatible with what official
site say: listing pages with avalible "functions/commands", instead of
listing pages with available "members/membergroups".

17.Modified BOLTFpreview in functions.php so that on previewing page $
escaped by `, <code> or /= =/ no more displayed as &#36;, and '~data~'
no more displayed as '&#126;data&#126;'.

18.Fixed "site.join" so that it doesn't show ugly "adminAdmin";
duplicated 'editor' and 'admin' were also excluded.

19.Modified "action.data" so that it works correctly. Interestingly,
in the old version fieldname were mismatched, however it still worked
in v3.3.1 and earlier.

20."site.language" do not contain several system messages, which were
also included in the pack.

21.Fixed several typos and nonsense or missense lines seen anywhere.


# Bugs or features?

1. There are several encoding/parsing problems in BoltWire v3.3.2
overviewed as the following:
   1) On previewing a page: (all fixed above)
      a) $ escaped by `, <code> or /= =/ becomes &#36;
      b) ~data~ becomes &#126;data&#126;

   2) On previewing a page, the source is changed: (all fixed above)
      a) All html entities are unescaped such as &lt; to <, &amp; to
&, &trade; to ™
      b) [messages] become <div class='message'>.......</div>

   3) On saving a page (source code rewrited):
      a) ' or " with one or more \ before them (ex: \", \\\\\\\\')
become single ' or "
      b) \ or n with one or more \ before them become single \ or
breakline if config "blockslashes" is true
      c) n with a \ before becomes a breakline if config
"blockslashes" is false
      d) %26lt; becomes &lt;, avoid by seperating with `, like %`26lt;
      e) %3c becomes <, if someone wants to display something like %3c
%..%..%.. in <code> he would get a problem

   4) On displaying page:
      a) <p> is usually generated mistakingly, especially in
paragraphs with multi-lines.
      b) * # = works significantly differ from other wikis do.
      c) lines ended with // get wrapped into an <p>
      d) `&lt; is displayed as %26lt; (fixed above)
      e) multi-line texts in /= =/ become one-line (fixed above)
      f) [messages] in skin page without wrapping == == are still
parsed (fixed above)
      g) uppercase html entities are not parsed (fixed above)
      h) html tags like <br /> are not parsed (fixed above)
      i) html tags with attrs are not parsed

--------------------------------------------------

3)b) I haven't fully understand what BOLTstripslashes does but if I
adds this line:
        BOLTreplace("\n", '\n');
     before the line:
        BOLTreplace('~data~', '~da`ta~');
     all input '\n' wound not be replaced by a line break;

4)a) For instance, the page 
http://www.boltwire.com/index.php?p=docs.handbook.markups
in the official site contains several bad <p>s.

The pattern is like this, given a source:
------------------
aaaaaaaaaaaaa
aaaaaaaaaaaaa
aaaaaaaaaaaaa

bbbbbbbbbbbbb
bbbbbbbbbbbbb
bbbbbbbbbbbbb
------------------

, which is expected to be parsed as the following:
------------------
<p>aaaaaaaaaaaaa<br/>
aaaaaaaaaaaaa<br/>
aaaaaaaaaaaaa</p>

<p>bbbbbbbbbbbbb<br/>
bbbbbbbbbbbbb<br/>
bbbbbbbbbbbbb</p>
------------------

is mistakingly parsed as the following:
------------------
aaaaaaaaaaaaa<br />
aaaaaaaaaaaaa<br />
<p>aaaaaaaaaaaaa</p>
bbbbbbbbbbbbb<br />
bbbbbbbbbbbbb<br />
bbbbbbbbbbbbb<br />
------------------

4)b) For example:

# blah (ordered like 1.)
#* blah1 (expected to be underorded but is ordered)
#* blah2 (expected to be underorded but is ordered)
#*# blah3

-----------------------

# blah
** blah2 (should start a new <ul> instead of living in the above <ol>)
** blah2
## blah2 (should be <ol> instead of <ul>)
== blah2
#*# blah3
### blah3
*** blah3 (the three "blah3" are all <ol>, amazing)
# blah4
*# blah4
## blah4

------------------------

* blah
= blah2
= blah3
** blah4
== blah5 (here's an extra line-break and can't be cleaned by `)
= blah6 (here's an extra line-break and can't be cleaned by `)
blah7

------------------

  In parctice, parsing <p>s with php is very difficult. The major
reason is that <p> cannot wrap a block element. Although we can set up
a plenty of rules to rule out <h#> <blockquote> <div> <table> <ul>
<ol> <pre> etc in the <p> </p>, however if there are escaped input or
function/variable output problems may occur. Furthermore, a user may
define the stylesheet on his/her own -- if we want to handle it we are
just inventing another "browser". We can see that many online services
like Yahoo Blogs, Google Docs, Google Sites, or Google * or else do
not actively parse <p>s. They just left the problem to the users, who
can still manually write <p>s -- and debug -- on their own.

  I'd advice that adding a config called "honorParagraphs", just like
"honorLines". It determines that if hL and hP, all lines and
paragraphs are parsed; if hL no hP, all \n are single linebreak (<br/
>; if hP not hL, only paragraphs are parsed; if both not, do no parse.
We won't die without paragraphs although <br/>s are sometimes ugly.
However if the paragraph parser is disexpective we may get stroke
dealing with it.

  Also, the output html code is quite ugly. It would be better to add
more \n to format it, although it would increase the difficulty for
the parser to handle the linebreaks.

  If I have more free time, I may make a plugin handling the parser
problems.


#Other questions/suggestions/optimizations

1. Instead of having < escaped, the pages "code.*" are stored just as
source code, which causes serious display disorder on action=undo or
action=history, and worsely, make restoring to ealier version
unvailable. It might be better to display diff of code pages after
escaping html tags.

   Well, in my opinion, I don't think that saving normal pages with
'<' escaped and the derivatives is necessary, since they are preparsed
into &lt; in BOLTdomarkup just before applying the markups. The
performance can be improved by saving the disk by not escaping < to
&lt; in the data file, and by removing the code that manages the
transformation in BOLTsaveEscapes, BOLTcharEncode, BOLTdomarkup,
BOLTFsource, BOLTFpreview and BOLTFinfo. Furthermore, we just add
htmlentities escaping before comparing 2 files in BOLTdiff, and the
code page problem is solved. Besides, in this way we don't have to
rack our brains to display original %26lt; &lt; %3c on the screen in
BoltWire. However a serious concern is that it might be discompatible
with old data.

2. Cache system is a good idea but there is a problem. Generally I
would turn off caching from admins and members to prevent member-only
pages from viewed by guests. Now the caches are made by guests. If I
logged in, I still get pages that are for guests - including the
action bar and everything - so if I want to edit a page or to view a
page for members, I have to type &action=edit or &action=view manually
and it is very inconvenient. It would be better if there is another
config such as 'noUseCacheGroup' which means that if you are logged in
in the group defined you always receive a newly-parsed true page
instead of a cached version, but you don't create a cache.

3. In most web forums, blogs, or wiki systems, going back to "previous
page" after sending a form with texts edited shows the post-edited
ones. However, BoltWire shows the texts of the current source of the
page being edited. This might overwrite the edited text and the
editor's works become in vain if there was an error so that the
editing submitted hasn't been saved (unfortunately, it happens quite
often...).

4. The documentation of markup table in the help system is out of
date.

5. <rules xxx>content</rules> seem not to work it a page contains 2 or
more <rules> blocks.

6. wikiWords doesn't seems to work. All page names are stricted to
lowercase even it's set to true.

7. I cannot figure out what this line in BOLTdomarkup is doing:
   $out = preg_replace('/\{\*\:([^{}+=*:]+)\}/', '{'.$page.'*:$1}',
$content); // base pages
   since {page*:var} is used for retrieving data from stamp page...
   and it causes /= {*:} =/ display unexpectedly

8. It would be better to combine BOLTCexists and BOLTexists. Also, the
former takes 'plugins/', 'files/' but the latter takes 'system'.

9. BOLTCstamp seems to take only current page instead of any page as
the comments say.

10.I can't figure out what farm/pub is for...

--

You received this message because you are subscribed to the Google Groups 
"BoltWire" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/boltwire?hl=en.


Reply via email to