[incubator-nlpcraft-website] branch NLPCRAFT-513-intents updated: WIP.

sergeykamov Tue, 06 Dec 2022 05:08:30 -0800

This is an automated email from the ASF dual-hosted git repository.

sergeykamov pushed a commit to branch NLPCRAFT-513-intents
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git



The following commit(s) were added to refs/heads/NLPCRAFT-513-intents by this 
push:
     new b9497fb  WIP.
b9497fb is described below

commit b9497fb45dd51a2f5c9a268ac02bff7779d79489
Author: Sergey Khisamov <skhisa...@fitechsource.com>
AuthorDate: Tue Dec 6 17:08:26 2022 +0400

    WIP.
---
 _data/idl-fns.yml    |  41 +++++++-------
 intent-matching.html | 149 +++++++++++++++------------------------------------
 2 files changed, 64 insertions(+), 126 deletions(-)

diff --git a/_data/idl-fns.yml b/_data/idl-fns.yml
index 1f1aa6f..8cbd4bd 100644
--- a/_data/idl-fns.yml
+++ b/_data/idl-fns.yml
@@ -23,9 +23,9 @@ fn-ent:
   - name: ent_id
     sig: |
       <b>ent_id</b>(t: Entity<em><sub>opt</sub></em>) ⇒ String, # ⇒ String
-    synopsis: Returns {% scaladoc NCEntity.html#getId-0 entity ID() %}
+    synopsis: Returns <a 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getId-0">entity ID</a>
     desc: |
-      Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getId()">entity 
ID</a>
+      Returns <a 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getId-0">entity ID</a>
       for the current entity (default) or the provided one by the optional 
parameter <code><b>t</b></code>. Note that this
       functions has a special shorthand <code><b>#</b></code>.
     usage: |
@@ -38,9 +38,9 @@ fn-ent:
   - name: ent_groups
     sig: |
       <b>ent_groups</b>(t: Entity<em><sub>opt</sub></em>) ⇒ List[String]
-    synopsis: Gets the list of <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">groups</a>
 this entity belongs to
+    synopsis: Gets the list of <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">groups</a> 
this entity belongs to
     desc: |
-      Gets the list of <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">groups</a>
+      Gets the list of <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">groups</a>
       the current entity (default) or the provided one by the optional 
parameter <code><b>t</b></code> belongs to. Note that,
       by default, if not specified explicitly, entity always belongs to one 
group with ID equal to entity ID.
       May return an empty list but never a <code>null</code>.
@@ -64,7 +64,7 @@ fn-ent:
       <b>ent_text</b>(t: Entity<em><sub>opt</sub></em>) ⇒ String
     synopsis: Returns entity's original text
     desc: |
-      Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getOriginalText()">entity's
 original text</a>.
+      Returns <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#mkText-0">entity's 
original text</a>.
       If <code>t</code> is not provided the current entity is assumed.
     usage: |
       // Result: entity original input text.
@@ -75,7 +75,7 @@ fn-ent:
       <b>ent_index</b>(t: Entity<em><sub>opt</sub></em>) ⇒ Long
     synopsis: Returns entity's index in the original input
     desc: |
-      Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getIndex()">entity's 
index</a> in the original input. Note that this is an index of the entity and 
not of the character.
+      Returns entity's index in the original input. Note that this is an index 
of the entity and not of the character.
       If <code>t</code> is not provided the current entity is assumed.
     usage: |
       // Result: 'true' if index of this entity in the original input is equal 
to 1.
@@ -151,11 +151,11 @@ fn-ent:
       <b>ent_is_before_group</b>(grp: String) ⇒ Boolean
     synopsis: |
       Returns <code>true</code> if there is a entity that belongs to the
-      <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+      <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
       <code>grp</code> after this entity
     desc: |
       Returns <code>true</code> if there is a entity that belongs to the
-      <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+      <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
       <code>grp</code> after this entity.
     usage: |
       // Result: 'true' if there is a entity that belongs to the group 'grp' 
after this entity.
@@ -166,11 +166,11 @@ fn-ent:
       <b>ent_is_after_group</b>(grp: String) ⇒ Boolean
     synopsis: |
       Returns <code>true</code> if there is a entity that belongs to the
-      <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+      <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
       <code>grp</code> before this entity
     desc: |
       Returns <code>true</code> if there is a entity that belongs to the
-      <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+      <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
       <code>grp</code> before this entity.
     usage: |
       // Result: 'true' if there is a entity that belongs to the group 'grp' 
before this entity.
@@ -357,19 +357,19 @@ fn-req:
   - name: req_id
     sig: |
       <b>req_id</b> ⇒ String
-    synopsis: Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getServerRequestId()">server
 request ID</a>
+    synopsis: Returns <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getRequestId-0">request 
ID</a>
     desc: |
-      Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getServerRequestId()">server
 request ID</a>.
+      Returns <a class="not-code" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getRequestId-0">request 
ID</a>.
     usage: |
-      // Result: server request ID.
+      // Result: request ID.
       req_id
 
   - name: req_text
     sig: |
       <b>req_text</b> ⇒ String
-    synopsis: Returns request <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">normalied
 text</a>
+    synopsis: Returns request <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getText-0">text</a>
     desc: |
-      Returns request <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">normalized
 text</a>.
+      Returns request <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getText-0">text</a>.
     usage: |
       // Result: request text.
       req_text
@@ -377,9 +377,9 @@ fn-req:
   - name: req_tstamp
     sig: |
       <b>req_tstamp</b> ⇒ Long
-    synopsis: Gets UTC/GMT receive <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">timestamp</a>
+    synopsis: Gets UTC/GMT receive <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getReceiveTimestamp-0">timestamp</a>
     desc: |
-      Gets UTC/GMT <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">timestamp</a>
+      Gets UTC/GMT <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getReceiveTimestamp-0">timestamp</a>
       in ms when user input was received.
     usage: |
       // Result: input receive timsstamp in ms.
@@ -388,9 +388,9 @@ fn-req:
   - name: user_id
     sig: |
       <b>user_id</b> ⇒ String
-    synopsis: Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCUser.html#getId()">user ID</a>
+    synopsis: Returns <code>user ID</code>
     desc: |
-      Returns <a class="not-code" target="javadoc" 
href="/apis/latest/org/apache/nlpcraft/model/NCUser.html#getId()">user ID</a>.
+      Returns <code>user ID</code>
     usage: |
       // Result: user ID.
       user_id      
@@ -897,8 +897,7 @@ fn-metadata:
       <b>meta_ent</b>(p: String) ⇒ Any
     synopsis: Gets entity metadata property <code><b>p</b></code>
     desc: |
-      Gets entity metadata property <code><b>p</b></code>. See
-      <a href="/data-model.html#meta">entity metadata</a> for more information.
+      Gets entity metadata property <code><b>p</b></code>.
     usage: |
       // Result: 'nlp:token:text' entity metadata property.
       meta_ent('nlp:token:text')
diff --git a/intent-matching.html b/intent-matching.html
index 7e5fe35..c8317cb 100644
--- a/intent-matching.html
+++ b/intent-matching.html
@@ -1144,123 +1144,62 @@ id: intent_matching
     </section>
     <section id="logic">
         <h2 class="section-title">Intent Matching Logic <a href="#"><i 
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-        <p>
-            In order to understand the intent matching logic lets review the 
overall user request processing workflow:
-        </p>
-        <figure>
-            <img class="img-fluid" style="border: none; padding: 0;" 
src="/images/intent_matching1.png" alt="">
-            <figcaption><b>Fig. 1</b> User Request Workflow</figcaption>
-        </figure>
-        <ul>
-            <li>
-                <b>Step: 0</b><br>
-                <p>
-                    Server receives REST call <code>/ask</code> or 
<code>/ask/sync</code> that contains the text
-                    of the sentence that needs to be processed.
-                </p>
-            </li>
-            <li>
-                <b>Step: 1</b><br>
-                <p>
-                    At this step the server attempts to find additional 
variations of the input sentence by substituting
-                    certain words in the original text with synonyms from 
Google's BERT dataset. Note that server will not use the synonyms that
-                    are already defined in the model itself - it only tries to 
compensate for the potential incompleteness
-                    of the model. The result of this step is one or more 
sentences that all have the same meaning as the
-                    original text.
-                </p>
-            </li>
-            <li>
-                <b>Step: 2</b><br>
-                <p>
-                    At this step the server takes one or more sentences from 
the previous step and tokenizes them. This
-                    process involves converting the text into a sequence of 
enriched entities representing named entities.
-                    This step also performs the initial server-side enrichment 
and detection of the
-                    <a href="/data-model.html#builtin">built-in named 
entities</a>.
-                </p>
-                <p>
-                    The result of this step is a sequence of converted 
sentences, where each element is a sequence
-                    of entities. These sequences are send down to the data 
probe that has requested data model deployed.
-                </p>
-            </li>
-            <li>
-                <b>Step: 3</b><br>
-                <p>
-                    This is the first step of the probe-side processing. At 
this point the data probe receives one or more
-                    sequences of entities. Probe then takes each sequence and 
performs the final enrichment by detecting user-defined
-                    elements additionally to the built-in entities that were 
detected on the server during step 2 above.
-                </p>
-            </li>
-            <li>
-                <b>Step: 4</b><br>
-                <p>
-                    This is an important step for understanding intent 
matching logic. At this step the data probe
-                    takes sequences of entities generated at the last step and 
comes up with one or more parsing
-                    variants. A parsing variant is a sequence of entities that 
is free from entity overlapping and other parsing
-                    ambiguities. Typically, a single sequence of entities can 
produce one (always) or more parsing variants.
-                </p>
-                <p>
-                    Let's consider the input text <code>'A B C D'</code> and 
the following elements defined in our model:
-                </p>
-                <pre class="brush: js">
-                "elements": [
-                    {
-                        "id": "elm1",
-                        "synonyms": ["A B"]
-                    },
-                    {
-                        "id": "elm2",
-                        "synonyms": ["B C"]
-                    },
-                    {
-                        "id": "elm3",
-                        "synonyms": ["D"]
-                    }
-                ],
-                </pre>
-                <p>
-                    All of these elements will be detected but since two of 
them are overlapping (<code>elm1</code> and
-                    <code>elm2</code>) there should be <b>two</b> parsing 
variants at the output of this step:
-                </p>
+            <p>
+                {% scaladoc NCPipeline NCPipeline %} processing result is 
collection of {% scaladoc NCVariant NCVariant %} instances.
+                As example let's consider the input text <code>'A B C 
D'</code> and the following elements defined in our model:
+            </p>
+            <pre class="brush: js">
+            "elements": [
+                {
+                    "id": "elm1",
+                    "synonyms": ["A B"]
+                },
+                {
+                    "id": "elm2",
+                    "synonyms": ["B C"]
+                },
+                {
+                    "id": "elm3",
+                    "synonyms": ["D"]
+                }
+            ],
+            </pre>
+            <p>
+                All of these elements will be detected but since two of them 
are overlapping (<code>elm1</code> and
+                <code>elm2</code>) there should be <b>two</b> parsing variants 
at the output of this step:
+            </p>
                 <ol>
                     <li><code>elm1</code>('A', 'B') <code>freeword</code>('C') 
<code>elm3</code>('D')</li>
                     <li><code>freeword</code>('A') <code>elm2</code>('B', 'C') 
<code>elm3</code>('D')</li>
                 </ol>
                 <p></p>
                 <p>
-                    Note that at this point the <em>system cannot determine 
which of these variants is the best one
+                    Note that initially the <em>system cannot determine which 
of these variants is the best one
                     for matching - there's simply not enough information at 
this stage</em>. It can only be determined
-                    when each variant is matched against model's intents - 
which happens in the next step.
-                </p>
-            </li>
-            <li>
-                <b>Step: 5</b><br>
-                <p>
-                    At this step the actual matching between intents and 
variants happens. Each parsing variant from the previous
-                    step is matched against each intent. Each matching pair of 
a variant and an intent produce a match with a
+                    when each variant is matched against model's intents.
+                    So, each parsing variant is matched against each intent. 
Each matching pair of a variant and an intent produce a match with a
                     <em>certain weight</em>. If there are no matches at all - 
an error is returned. If matches were found, the match
                     with the biggest weight is selected as a winning match. If 
multiple matches have the same weight, their
                     respective variants' weights will be used to further sort 
them out. Finally, the intent's callback from the winning match is
                     called.
                 </p>
-                <p>
-                    Although details on exact algorithm on weight calculation 
are too complex, here's the general guidelines
-                    on what determines the weight of the match between a 
parsing variant and the intent. Note that these rules
-                    coalesce around the principle idea that the <b>more 
specific match always wins</b>:
-                </p>
-                <ul>
-                    <li>
-                        A match that captures more entities has more weight 
than a match with less entities. As a corollary, the match
-                        with less free words (i.e. unused words) has bigger 
weight than a match with more free words.
-                    </li>
-                    <li>
-                        Entities for user-defined elements are more important 
than built-in entities.
-                    </li>
-                    <li>
-                        A more specific match has bigger weight. In other 
words, a match that uses an entity from the conversation
-                        context (i.e short-term-memory) has less weight than a 
match that only uses entities from the current request. In the same
-                        way older entities from the conversation give less 
weight than the more recent ones.
-                    </li>
-                </ul>
+        <p>
+            Although details on exact algorithm on weight calculation are too 
complex, here's the general guidelines
+            on what determines the weight of the match between a parsing 
variant and the intent. Note that these rules
+            coalesce around the principle idea that the <b>more specific match 
always wins</b>:
+        </p>
+        <ul>
+            <li>
+                A match that captures more entities has more weight than a 
match with less entities. As a corollary, the match
+                with less free words (i.e. unused words) has bigger weight 
than a match with more free words.
+            </li>
+            <li>
+                Entities for user-defined elements are more important than 
built-in entities.
+            </li>
+            <li>
+                A more specific match has bigger weight. In other words, a 
match that uses an entity from the conversation
+                context (i.e short-term-memory) has less weight than a match 
that only uses entities from the current request. In the same
+                way older entities from the conversation give less weight than 
the more recent ones.
             </li>
         </ul>
     </section>

[incubator-nlpcraft-website] branch NLPCRAFT-513-intents updated: WIP.

Reply via email to