Revision: 6867
          
http://languagetool.svn.sourceforge.net/languagetool/?rev=6867&view=rev
Author:   dnaber
Date:     2012-05-11 10:22:26 +0000 (Fri, 11 May 2012)
Log Message:
-----------
small updates and fixes for the developer page

Modified Paths:
--------------
    trunk/website/www/development/index.php

Modified: trunk/website/www/development/index.php
===================================================================
--- trunk/website/www/development/index.php     2012-05-10 23:21:48 UTC (rev 
6866)
+++ trunk/website/www/development/index.php     2012-05-11 10:22:26 UTC (rev 
6867)
@@ -2,7 +2,7 @@
 $page = "development";
 $title = "LanguageTool";
 $title2 = "Development";
-$lastmod = "2012-05-11 21:05:00 CET";
+$lastmod = "2012-05-11 21:06:00 CET";
 include("../../include/header.php");
 include('../../include/geshi/geshi.php');
 ?>
@@ -60,7 +60,7 @@
 <h3><a name="process">Language checking process</a></h3>
 <ol>
        <li>The text to be checked is split into sentences</li>
-       <li>Each sentence is split into words</li>
+       <li>Each sentence is split into words, so called <em>tokens</em></li>
        <li>Each word is assigned its part-of-speech tag(s) (e.g. <em>cars</em>
                = plural noun, <em>talked</em> = simple past verb)</li>
        <li>The analyzed text is then matched against the built-in rules and 
against
@@ -76,20 +76,36 @@
 
 <ul class="largelist">
        <li><?php hl('<token>think</token>', "xmlcodeNoIndent"); ?>
-               matches the word <em>think</em></li>
+               matches the word <em>think</em>
+       </li>
+       <li><?php hl('<token>think</token>
+<token>about</token>', "xmlcodeNoIndent"); ?>
+               Matches the phrase <em>think about</em> - as the text is split 
into words, you need to list
+      each word separately as a token. This will not work: <span 
style="text-decoration: line-through"><?php hl('<token>think about</token>', 
"xmlcodeNoIndent"); ?></span>
+       </li>
        <li><?php hl('<token regexp="yes">think|say</token>', 
"xmlcodeNoIndent"); ?>
                matches the regular expression
-               <tt>think|say</tt>, i.e. the word <em>think</em> or 
<em>say</em></li>
-       <li><?php hl('<token postag="VB" /> <token>house</token>', 
"xmlcodeNoIndent"); ?>
+               <tt>think|say</tt>, i.e. the word <em>think</em> or 
<em>say</em>. You can write simple rules without
+        knowing regular expressions, but if you want to learn more about them 
you can try
+        <?=show_link("this tutorial", 
"http://www.regular-expressions.info/tutorialcnt.html";)?>.
+    </li>
+       <li><?php hl('<token postag="VB" />
+<token>house</token>', "xmlcodeNoIndent"); ?>
                matches a base form verb followed by the word <em>house</em>.
                See <?=show_link("resource/en/tagset.txt", 
"http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/en/tagset.txt";,
 0) ?>
-        for a list of possible English part-of-speech tags.</li>
-       <li><?php hl('<token>cause</token> <token regexp="yes" 
negate="yes">and|to</token>', "xmlcodeNoIndent"); ?>
+        for a list of possible English part-of-speech tags.
+    </li>
+       <li><?php hl('<token>cause</token>
+<token regexp="yes" negate="yes">and|to</token>', "xmlcodeNoIndent"); ?>
                matches the word <em>cause</em> followed
-               by any word that is not <em>and</em> or <em>to</em></li>
-       <li><?php hl('<token postag="SENT_START" /> <token>foobar</token>', 
"xmlcodeNoIndent"); ?>
+               by any word that is not <em>and</em> or <em>to</em>
+    </li>
+       <li><?php hl('<token postag="SENT_START" />
+<token>foobar</token>', "xmlcodeNoIndent"); ?>
                matches the word <em>foobar</em> only
-               at the beginning of a sentence</li>
+               at the beginning of a sentence. The corresponding postag for 
the end of a sentence
+               is <tt>SENT_END</tt>.
+    </li>
 </ul>
 
 <p>A pattern's terms are matched case-insensitively by default. This can be 
changed
@@ -112,9 +128,9 @@
 <p>A short description of the elements and their attributes:</p>
 
 <ul class="largelist">
-       <li>element <tt>rule</tt>, attribute <tt>id</tt>: an internal 
identifier used to address this rule</li>
-       <li>element <tt>rule</tt>, attribute <tt>name</tt>: the text displayed 
in the configuration</li>
-       <li>element <tt>pattern</tt>, attributes <tt>mark_from</tt> and 
<tt>mark_to</tt>: what part of the original 
+       <li>element <tt>rule</tt>, attribute <tt>id</tt>: An internal 
identifier used to address this rule. This must be unique.</li>
+       <li>element <tt>rule</tt>, attribute <tt>name</tt>: The text displayed 
in the configuration.</li>
+       <li>element <tt>pattern</tt>, attributes <tt>mark_from</tt> and 
<tt>mark_to</tt>: What part of the original
                text should be marked. The default, <tt>mark_from="0"</tt> and 
<tt>mark_to="0"</tt>, means to mark
                the complete matching token. For example, if the pattern 
contains three token
                elements that match the input text, those three matching words 
will be marked in the text.

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-cvs

Reply via email to