> -----Original Message----- > From: Joel > Sent: Thursday, September 16, 2004 7:01 PM > Subject: Re: [Spirit-general] RE: [Spirit-devel] More Quickbook updates > > David Barrett wrote: > > > I maintain an open-source PHP wiki called QwikiWiki > > (http://www.qwikiwiki.com), and thus I've grappled with a similar > > question. The final syntax I settled on can be found here: > > > > http://quinthar.com/qwikiwiki/index.php?page=QwikiSyntax > > Hi David, > > It's not clear what rules you are using for Format Detection. > Could you expound a little bit? As you know, these formatting > rules are very ambiguous and rather context sensitive. I'd have > to write that in Spirit's EBNF. Fortunately, Spirit allows > context sensitivity and ambiguous grammars to some extent > but there still has to be some formality. Let's focus with the > *bold* markup. When does it apply and when does it not? For > example, how do you detect that 3*4*5 will not make the 4 bold? >
Yes, it's a nasty problem. Here is the regular expression I use for bold: '/(?<=^|^\W|\s\W|\s|<b>|<i>|<code>)\*(([^*\s]+\s+)*[^*\s]+)\*(?=<\/b>|<\/i>| <\/code>|\s|\W\s|\W$|$)/i' And here is the replacement pattern: '<B>$1</B>' In essence, I only allow bold under certain circumstances, and disallow everything else. In the case you mention (3*4*5), my pattern would fail because it's not surrounded by whitespace or non-word characters. Basically, I'm not trying to create an iron-clad set of rules. Rather, I'm trying to guess what the user wants to do. If I can guess correctly 99% of the time, I count that as a success, and I leave it up to the user (who is vastly smarter than my parser) to figure out the remaining 1%. I don't remember precisely how I came up with this, other than it required a lot of trial and error and now it seems to work. However, it should be noted that this rule is only applied after the source text has been matched/manipulated with a number of other rules. The complete PHP source file for the functions that convert my wiki syntax to HTML is attached to the end of this email. > << BTW, no underline markup? >> So far as I know, nobody has ever complained about not having it, so I saw no reason to add it. :) A feature never added is a feature that never breaks. -david PS: Here's the original PHP source file. Start with the function " QWFormatQwikiFile" somewhere near the bottom, and work your way up. Good luck, and I hope this helps! <?php // _wikiLib.php // // Copyright 2004, David Barrett. All Rights Reserved. // Email: [EMAIL PROTECTED] // Web: http://www.quinthar.com // // See LICENSE for the complete licensing details. // QWCleanQwikiPageName function QWCleanQwikiPageName( $page ) { // Just strip any underscores from the name for pretty display return str_replace( '_', ' ', $page ); } // QWFormatQwikiBlockCallback function QWFormatQwikiBlockCallback( $matches ) { return $matches[1] . QWFormatQwikiPageName( $matches[2] ) . $matches[3]; } // QWFormatQwikiBlock function QWFormatQwikiBlockInPlace( &$block ) { $block = QWFormatQwikiBlock( $block ); } function QWFormatQwikiBlock( $block ) { // See if it looks like an attached file name global $QW, $QW_CONFIG; $replacedBlock = $block; if( preg_match( '/^(\W*)([\w-]+(\.[\w-]+)+)(\W*)/i', $block, $matches ) ) if( is_dir( $QW['attachDir'] ) ) { // Look to see if it matches an attachment $attachDir = dir( $QW['attachDir'] ); while( $filename = $attachDir->read( ) ) if( !strcasecmp( $filename, $matches[2] ) ) { // If it's an image, do an inline image $pathParts = pathinfo( $filename ); $extension = $pathParts['extension']; if( !strcasecmp( $extension, 'jpg' ) || !strcasecmp( $extension, 'png' ) || !strcasecmp( $extension, 'gif' ) ) // Give it an inline image $replacedBlock = "$matches[1]<IMG SRC='$QW[attachDir]/$filename'/>$matches[4]"; else // Link to the attachment $replacedBlock = "$matches[1]<A HREF='$QW[attachDir]/$filename'$QW[hrefTarget]>$matches[2]</A>$matches[4]"; break; } $attachDir->close( ); if( $block != $replacedBlock ) return $replacedBlock; } // Determine what kind of block it is and mark it up static $patternArray = array( '/^(\W*)(\w[\w.-]*\w@([\w-]+\.)+([\w-]+\.[a-z]{2}|com|net|org|edu|gov|biz|in fo|mil|int))+(\W*)$/i', // Email '/^(\W*)(\w+(\.\w+)*\.([\w-]+\.[a-z]{2}|com|net|org|edu|gov|biz|info|mil|int )(\/(~?[\w]+([.~-]+[\w]+)*)?)*)(\W*)$/i', // WWW '/^(\W*)(([a-z]+:\/\/)([\w-]+@)?[\w-]+(\.[\w-]+)*(\/([\w~][\w.~-]*)?)*([\?#] \S+)?)(\W*)$/i', // WWW '/^(\W*)!([A-Z]+[a-z\d]+[A-Z]+\w*)(.*)$/' // Explicitly ignore CamelBack ); $replaceArray = array( '$1<A HREF="mailto:$2">$2</A>$5', // Email '$1<A HREF="http://$2"' . $QW['hrefTarget'] . '>$2</A>$8', // Implicit WWW '$1<A HREF="$2"' . $QW['hrefTarget'] . '>$2</A>$9', // Explicit WWW '$1$2$3' ); $replacedBlock = preg_replace( $patternArray, $replaceArray, $block ); if( $block != $replacedBlock ) return $replacedBlock; // Detect QuickiPages return preg_replace_callback( $QW_CONFIG['tagPatternArray'], "QWFormatQwikiBlockCallback", $block ); } // QWFormatQwikiText function QWFormatQwikiText( $text ) { // Split into whitespace-separated blocks $result = ""; $blocks = preg_split( '/\s/', $text ); foreach( $blocks as $block ) { // Process the block and add to the result $processedBlock = QWFormatQwikiBlock( $block ); if( $result == "" ) $result = $processedBlock; else $result .= " " . $processedBlock; } // Apply line-level processing static $patternArray = array( '/(?<=^|^\W|\s\W|\s|<b>|<i>|<code>)\*(([^*\s]+\s+)*[^*\s]+)\*(?=<\/b>|<\/i>| <\/code>|\s|\W\s|\W$|$)/i', // Bold '/(?<=^|^\W|\s\W|\s|<b>|<i>|<code>)\/(([^\/\s]+\s+)*[^\/\s]+)\/(?=<\/b>|<\/i >|<\/code>|\s|\W\s|\W$|$)/i', // Italics '/(?<=^|^\W|\s\W|\s|<b>|<i>|<code>)#(([^#\s]+\s+)*[^#\s]+)#(?=<\/b>|<\/i>|<\ /code>|\s|\W\s|\W$|$)/i', // Code ); static $replaceArray = array( '<B>$1</B>', // Bold '<I>$1</I>', // Italics '<CODE>$1</CODE>', // Code ); // Italics overlaid on bold requires two passes (ie, "/*blah*/"). I'd like to use // while() here, but I'm afraid of entering an infinite loop if( ($newResult = preg_replace( $patternArray, $replaceArray, $result )) != $result ) $newResult = preg_replace( $patternArray, $replaceArray, $newResult ); return $newResult; } // QWFormatMixedText function QWFormatMixedText( $text ) { // Format intermixed HTML and Wiki blocks $out = ""; while( strlen( $text ) ) { // Get everything up to the first <HTML> chunk $wikiEnd = strpos( $text, '<HTML>' ); if( $wikiEnd === false ) $wikiEnd = strpos( $text, '<html>' ); if( $wikiEnd === false ) { // Everything left is Wiki text $wikiText = $text; $text = ""; } else { // Peel off the front and include as Wiki text $wikiText = substr( $text, 0, $wikiEnd ); $text = substr( $text, $wikiEnd ); } // Output the front as Qwiki text, if there is any if( strlen( $wikiText ) ) $out .= QWFormatQwikiText( $wikiText ); // See if there is anything left if( strlen( $text ) ) { // Output everything up to the </HTML> direct $htmlEnd = strpos( $text, '</HTML>' ); if( $htmlEnd === false ) $htmlEnd = strpos( $text, '</html>' ); if( $htmlEnd === false ) { // All remaining text is html $htmlText = $text . "</HTML>"; $text = ""; } else { // Peel off the front and include as HTML text $htmlText = substr( $text, 0, $htmlEnd + 7 ); // +7 == strlen( '</HTML>' ) $text = substr( $text, $htmlEnd + 7 ); } // Output straight HTML, if any $out .= $htmlText; } } // Done return $out; } // QWFormatQwikiLine function QWFormatQwikiLine( $line ) { // Strip off trailing whitespace $line = rtrim( $line ); // See if there is anything left if( $line == "" ) { // Just output a blank line return "<BR/>"; } else { // See if there is any justification $align = 'left'; if( preg_match( '/^\[ (.*)/', $line, $matches ) ) { // Left justify - default $align = 'left'; $line = $matches[1]; } else if( preg_match( '/^\[\] (.*)/', $line, $matches ) ) { // Center $align = 'center'; $line = $matches[1]; } else if( preg_match( '/^\] (.*)/', $line, $matches ) ) { // Right justify $align = 'right'; $line = $matches[1]; } // See if it's a heading (all initial-caps) static $count; if( preg_match( '/^([A-Z]+\S*( [^a-z\s]+\S*)+)$/', $line, $matches ) ) { // Return the heading return "<DIV CLASS='QWHeading1' STYLE='text-align: $align'>" . QWFormatMixedText( $line ) . "</DIV>\n"; } else { // See if it's indented at all $bullet = ""; $indent = 0; if( preg_match( '/^(\s+)(.*)/', $line, $matches ) ) { // Otherwise just indent $indent = strlen( $matches[1] ); $line = $matches[2]; } // See if it's a bulleted list if( preg_match( '/^\? (.*)/', $line, $matches ) ) { // Bullet with the question icon $bullet = "<IMG SRC='questionicon.gif' WIDTH='16' HEIGHT='16'/>"; $line = $matches[1]; } else if( preg_match( '/^! (.*)/', $line, $matches ) ) { // Bullet with the exclamation icon $bullet = "<IMG SRC='exclamationicon.gif' WIDTH='16' HEIGHT='16'/>"; $line = $matches[1]; } else if( preg_match( '/^\. (.*)/', $line, $matches ) ) { // Bullet with the circle bullet $bullet = "•"; $line = $matches[1]; } else if( preg_match( '/^- (.*)/', $line, $matches ) ) { // Bullet with the em-dash $bullet = "—"; $line = $matches[1]; } else if( preg_match( '/^(\d+)\. (.*)/', $line, $matches ) ) { // Output a numbered list if( $matches[1] == 1 ) $count[$indent] = 1; else if( isset( $count[$indent] ) ) ++$count[$indent]; else $count[$indent]=$matches[1]; $bullet = "$count[$indent]."; $line = $matches[2]; } // See if this line has a inline heading $inlineHeading = ""; if( preg_match( '/^([A-Z]+[^\s:]*( [^a-z\s:]+[^\s:]*)*): (.*)/', $line, $matches ) ) { // Create an inline heading $inlineHeading = "<B>" . QWFormatMixedText( $matches[1] ) . "</B>: "; $line = $matches[3]; } // Output with or without bullet if( $bullet != "" ) $lineHTML = "<TABLE CLASS='QWBullet' CELLPADDING='0' CELLSPACING='0'><TR><TD WIDTH='25' CLASS='QWBullet' VALIGN='top' ALIGN='center'>$bullet</TD><TD CLASS='QWBullet' VALIGN='top' ALIGN='left'>" . $inlineHeading . QWFormatMixedText( $line ) . "</TD></TR></TABLE>"; else $lineHTML = "<DIV CLASS='QWNormal'>" . $inlineHeading . QWFormatMixedText( $line ). "</DIV>" ; return QWFormatIndent( $indent, $lineHTML, $align ); } } } // QWFormatQwikiLineArray function QWFormatQwikiLineArray( &$lineArray ) { // Walk across all the lines $htmlBlock = ""; $htmlIndent = 0; $inHTMLBlock = false; $out = ""; $c = 0; while( $c < count( $lineArray ) ) { // Concate as many lines as necessary to close all <HTML> blocks (or until it runs out) $line = ""; do $line .= $lineArray[ $c++ ]; while( ($c < count( $lineArray ) ) && (preg_match( '/<HTML>/i', $line ) != preg_match( '/<\/HTML>/i', $line )) ); // Process the line $out .= QWFormatQwikiLine( $line ); } return $out; } // QWFormatQwikiFile function QWFormatQwikiFile( $absolutePath ) { // Read the file global $QW; if( file_exists( $absolutePath ) ) { // Output the file $fileArray = file( $absolutePath ); return QWFormatQwikiLineArray( $fileArray ); } else { // The file doesn't exist return QWFormatQwikiLine( "(This page does not yet exist or has been deleted. Click <B>Edit this page</B> to create.)" ) . "\n"; } } // QWCreateDataPath function QWCreateDataPath( $page, $extension ) { // Convert to a filename return 'data/' . $page . $extension; } // QWGetRecentlyChangedQwikiPageNameList function QWGetRecentlyChangedQwikiPageNameList( ) { // Get modified-sorted list of Qwiki pages $changedFileList = QWGetRecentlyChangedFileList( 'data/', "/\.qwiki$/" ); if( !isset( $changedFileList ) ) return; // Loop across and create Wiki page names foreach( $changedFileList as $filename ) // Pluck off the last six characteres (the length of '.qwiki') $pageNameList[] = substr( $filename, 0, strlen( $filename ) - 6 ); // Done return $pageNameList; } // QWFormatQwikiPageName function QWFormatQwikiPageNameInPlace( &$page ) { $page = QWFormatQwikiPageName( $page ); } function QWFormatQwikiPageName( $page ) { // Ignore it outright if it's in the ignore array global $QW, $QW_CONFIG; if( in_array( $page, $QW_CONFIG['ignoreQwikiTagArray'] ) ) return $page; // Use the redirect if it's in the redirect array if( array_key_exists( $page, $QW_CONFIG['redirectTagArray'] ) ) return "<A HREF='" . $QW_CONFIG['redirectTagArray'][$page] . "'>" . QWCleanQwikiPageName( $page ) . "</A>"; // See if it's a valid QuikiPage, of if we can hack off a trailing 's'. If not, create a new QuikiPage. $path = QWCreateDataPath( $page, '.qwiki' ); if( preg_match( '/(.+?)s$/', $page, $miniMatches ) ) $path2 = QWCreateDataPath( $miniMatches[1], '.qwiki' ); else $path2 = ""; if( file_exists( $path ) ) return "<A HREF='index.php?page=$page$QW[URLSuffix]'$QW[hrefTarget]>" . QWCleanQwikiPageName( $page ) . "</A>"; else if( $path2 != "" && file_exists( $path2 ) ) return "<A HREF='index.php?page=$miniMatches[1]$QW[URLSuffix]'$QW[hrefTarget]>" . QWCleanQwikiPageName( $miniMatches[1] ) . "s</A>"; else return QWCleanQwikiPageName( $page ) . ( !$QW['pageIsProtected'] || $QW['userIsAuthenticated'] ? "<A HREF='index.php?page=$page&from=$QW[page]$QW[URLSuffix]'$QW[hrefTarget]>?</A >" : "" ); } // QWFormatQwikiPageNameDelta function QWFormatQwikiPageNameDeltaInPlace( &$pageName ) { $pageName = QWFormatQwikiPageNameDelta( $pageName ); } function QWFormatQwikiPageNameDelta( $pageName ) { // Get the path and timestamp $path = QWCreateDataPath( $pageName, '.qwiki' ); $then = filemtime( $path ); $delta = QWFormatRelativeDate( time( ), $then ); return QWFormatQwikiPageName( $pageName ) . " (" . $delta . ")"; } // QWFormatHTMLList function QWFormatHTMLList( &$htmlList, $separator ) { // Verify a list exists if( !isset( $htmlList ) || !count( $htmlList ) ) return ""; // Loop across the page names and put into a list $out = ""; for( $c=0; $c<count( $htmlList )-1; ++$c ) $out .= $htmlList[$c] . $separator; $out .= $htmlList[$c]; return $out; } ?> ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ Boost-docs mailing list [EMAIL PROTECTED] Unsubscribe and other administrative requests: https://lists.sourceforge.net/lists/listinfo/boost-docs
