This is an automated email from the ASF dual-hosted git repository. aradzinski pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/master by this push: new 51d192c WIP. 51d192c is described below commit 51d192c1a1907373983059d5614bd2735be05049 Author: Aaron Radzinski <aradizn...@apache.org> AuthorDate: Wed Jul 28 13:28:05 2021 -0700 WIP. --- _layouts/documentation.html | 7 - basic-concepts.html | 271 --------------------------------- blogs/quick_intro_apache_nlpcraft.html | 10 +- data-model.html | 69 +++++++++ 4 files changed, 74 insertions(+), 283 deletions(-) diff --git a/_layouts/documentation.html b/_layouts/documentation.html index 9a8c8f4..74d0301 100644 --- a/_layouts/documentation.html +++ b/_layouts/documentation.html @@ -56,13 +56,6 @@ layout: interior </li> <li class="side-nav-title">Developer Guide</li> <li> - {% if page.id == "basic_concepts" %} - <a class="active" href="/basic-concepts.html">Basic Concepts</a> - {% else %} - <a href="/basic-concepts.html">Basic Concepts</a> - {% endif %} - </li> - <li> {% if page.id == "first_example" %} <a class="active" href="/first-example.html">First Example</a> {% else %} diff --git a/basic-concepts.html b/basic-concepts.html deleted file mode 100644 index 72638e7..0000000 --- a/basic-concepts.html +++ /dev/null @@ -1,271 +0,0 @@ ---- -active_crumb: Basic Concepts -layout: documentation -id: basic_concepts ---- - -<!-- - Licensed to the Apache Software Foundation (ASF) under one or more - contributor license agreements. See the NOTICE file distributed with - this work for additional information regarding copyright ownership. - The ASF licenses this file to You under the Apache License, Version 2.0 - (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<div id="basic-concepts" class="col-md-8 second-column"> - <section id="overview"> - <h2 class="section-title">Basic Concepts <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> - <p> - Below we’ll cover some of the key concepts that are important for NLPCraft: - </p> - <ul> - <li><a href="#model">Data Model</a></li> - <li><a href="#ne">Named Entities</a></li> - <li><a href="#intent">Intent Matching</a></li> - <li><a href="#stm">Conversation <span class="amp">&</span> STM</a></li> - </ul> - </section> - <section id="model"> - <h2 class="section-sub-title">Data Model <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> - <p> - Data model is a central concept in NLPCraft. It defines natural language interface to your public or - private data sources like on-premise database or a cloud SaaS application. - NLPCraft employs a <em>model-as-a-code</em> approach where entire data model is - an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> - interface which can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy. - </p> - <p> - A data model defines: - </p> - <ul> - <li>Set of model <a href="data-model.html#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li> - <li>Zero or more intents and their callbacks.</li> - <li>Common model configuration and various life-cycle callbacks.</li> - </ul> - <p> - Note that model-as-a-code approach allows you to use any software lifecycle tools and - frameworks like various build tools, CI/SCM tools, IDEs, etc. to develop and maintain your data model. - You don't have to use additional web-based tools to manage some aspects of your - data models - your entire model and all of its components are part of your project source code. - </p> - <p> - Read more about data models <a href="data-model.html">here</a>. - </p> - </section> - <section id="ne"> - <h2 class="section-sub-title">Named Entities <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> - <p> - Named entity, also known as a model element or a token, is one of the main a components defined by the NLPCraft data model. - A named entity is one or more individual words that have a consistent semantic meaning and typically denote a - real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such - object can be abstract or have a physical existence. - </p> - <p> - For example, in the following sentence: - </p> - <p> - <i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for 12pm today in San Francisco.</code> - </p> - <p> - the following named entities can be detected: - </p> - <table class="gradient-table"> - <thead> - <tr> - <th>Words</th> - <th>Type</th> - <th>Normalized Value</th> - </tr> - </thead> - <tbody> - <tr> - <td><code>Meeting</code></td> - <td>CUSTOM_OBJ</td> - <td>meeting</td> - </tr> - <tr> - <td><code>set</code></td> - <td>CUSTOM_ACT</td> - <td>set</td> - </tr> - <tr> - <td><code>12pm today</code></td> - <td>DATE_TIME</td> - <td>12:00 September 1, 2019 GMT</td> - </tr> - <tr> - <td><code>San Francisco</code></td> - <td>GEO_CITY</td> - <td>San Francisco, CA USA</td> - </tr> - </tbody> - </table> - <p> - In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many - different notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>, - <code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form. - </p> - <p> - The process of detecting named entities is called Named Entity Recognition (NER). There are many - different ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using - statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based - named entities definition as well as the ability to compose compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL) - combining other named entities including named entities from external projects such OpenNLP, spaCy or Stanford CoreNLP. - </p> - <p> - Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic - abstractions like geographical location or time when you are trying to understand the meaning of the sentence. - One of the main goals of named entities is to act as an input ingredients for intent matching. - </p> - <p> - Read more in-depth about named entities <a href="data-model.html">here</a>. - </p> - </section> - <section id="intent"> - <h2 class="section-sub-title">Intent Matching <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> - <p> - You can think of intent matching as regular expression matching where instead of characters you deal with detected named entities. - Intent defines a pattern in terms of detected named entities (or tokens) and a callback to call when submitted sentence - matches that pattern. - </p> - <p> - Intents can also match on the <em>dialog flow</em> additionally to the matching on the current user sentence. - Dialog flow matching means matching an intent based on what intents were matched previously for the same user - and data model, i.e. the flow of the dialog. Note that you should not confuse dialog flow intent matching with - conversational STM that is used to fill in missing tokens from memory. - </p> - <div class="bq success"> - <div class="bq-idea-container"> - <div><div>💡</div></div> - <div> - You can think of NLPCraft data model as a mechanism to define named entities and intents that use - these named entities to pattern match the user input. - </div> - </div> - </div> - <p> - Learn more details about intent matching <a href="intent-matching.html">here</a>. - </p> - </section> - <section id="stm"> - <h2 class="section-sub-title">Conversation <span class="amp">&</span> STM <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> - <div class="bq info"> - <b>Short-Term Memory</b> - <p> - Read more in-depth explanation about maintaining conversational context and - Short-Term Memory in Aaron Radzinski <a href="/blogs/short_term_memory.html">blog.</a> - </p> - </div> - - <p> - NLPCraft provides automatic conversation management right out of the box. - Conversation management is based on the idea of short-term memory (STM). STM is automatically - maintained by NLPCraft per each user and data model. Essentially, NLPCraft "remembers" - the context of the conversation and can supply the currently missing elements from its memory (i.e. from STM). - STM implementation is also fully integrated with intent matching. - </p> - <p> - Maintaining conversation state is necessary for effective context resolution, so that users - could ask, for example, the following sequence of questions using example weather model: - </p> - <dl class="stm-example"> - <dd><i class="fa fa-fw fa-angle-right"></i>What’s the weather in London today?</dd> - <dt> - <p> - User gets the current London’s weather. - STM is empty at this moment so NLPCraft expects to get all necessary information from - the user sentence. Meaningful parts of the sentence get stored in STM. - </p> - <div class="stm-state"> - <div class="stm"> - <label>STM Before:</label> - <span> </span> - </div> - <div class="stm"> - <label>STM After:</label> - <span>weather</span> - <span>London</span> - <span>today</span> - </div> - </div> - </dt> - <dd><i class="fa fa-fw fa-angle-right"></i>And what about Berlin?</dd> - <dt> - <p> - User gets the current Berlin’s weather. - The only useful data in the user sentence is name of the city <code>Berlin</code>. But since - NLPCraft now has data from the previous question in its STM it can safely deduce that we - are asking about <code>weather</code> for <code>today</code>. - <code>Berlin</code> overrides <code>London</code> in STM. - </p> - <div class="stm-state"> - <div class="stm"> - <label>STM Before:</label> - <span>weather</span> - <span>London</span> - <span>today</span> - </div> - <div class="stm"> - <label>STM After:</label> - <span>weather</span> - <span><b>Berlin</b></span> - <span>today</span> - </div> - </div> - </dt> - <dd><i class="fa fa-fw fa-angle-right"></i>Next week forecast?</dd> - <dt> - <p> - User gets the next week forecast for Berlin. - Again, the only useful data in the user sentence is <code>next week</code> and <code>forecast</code>. - STM supplies <code>Berlin</code>. <code>Next week</code> override <code>today</code>, and - <code>forecast</code> override <code>weather</code> in STM. - </p> - <div class="stm-state"> - <div class="stm"> - <label>STM Before:</label> - <span>weather</span> - <span>Berlin</span> - <span>today</span> - </div> - <div class="stm"> - <label>STM After:</label> - <span><b>forecast</b></span> - <span>Berlin</span> - <span><b>Next week</b></span> - </div> - </div> - </dt> - </dl> - <p> - Note that STM is maintained per user and per data model. - Conversation management implementation is also smart enough to clear STM after certain - period of time, i.e. it “forgets” the conversational context after few minutes of inactivity. - Note also that conversational context can also be cleared explicitly - via <a href="https://github.com/apache/incubator-nlpcraft/blob/master/openapi/nlpcraft_swagger.yml" target="github">REST API</a>. - </p> - </section> -</div> -<div class="col-md-2 third-column"> - <ul class="side-nav"> - <li class="side-nav-title">On This Page</li> - <li><a href="#model">Data Model</a></li> - <li><a href="#ne">Named Entities</a></li> - <li><a href="#intent">Intent Matching</a></li> - <li><a href="#stm">Conversation <span class="amp">&</span> STM</a></li> - {% include quick-links.html %} - </ul> -</div> - - - - diff --git a/blogs/quick_intro_apache_nlpcraft.html b/blogs/quick_intro_apache_nlpcraft.html index f3cb800..cfade05 100644 --- a/blogs/quick_intro_apache_nlpcraft.html +++ b/blogs/quick_intro_apache_nlpcraft.html @@ -98,10 +98,10 @@ publish_date: November 16, 2020 NLPCraft natively integrates with 3rd party libraries for basic NLP processing and named entity recognition: </p> <div style="display: inline-block; margin-bottom: 20px"> - <a style="margin-right: 10px" target="opennlp" href="https://opennlp.apache.org"><img src="/images/opennlp-logo.png" height="32px" alt=""></a> - <a style="margin-right: 10px" target="google" href="https://cloud.google.com/natural-language/"><img src="/images/google-cloud-logo-small.png" height="32px" alt=""></a> - <a style="margin-right: 10px" target="stanford" href="https://stanfordnlp.github.io/CoreNLP"><img src="/images/corenlp-logo.gif" height="48px" alt=""></a> - <a style="margin-right: 10px" target="spacy" href="https://spacy.io"><img src="/images/spacy-logo.png" height="32px" alt=""></a> + <a style="margin-right: 10px" target=_ href="https://opennlp.apache.org"><img src="/images/opennlp-logo-h32.png" alt=""></a> + <a style="margin-right: 10px" target=_ href="https://cloud.google.com/natural-language/"><img src="/images/google-cloud-logo-small-h32.png" alt=""></a> + <a style="margin-right: 10px" target=_ href="https://stanfordnlp.github.io/CoreNLP"><img src="/images/corenlp-logo-h48.png" alt=""></a> + <a style="margin-right: 10px" target=_ href="https://spacy.io"><img src="/images/spacy-logo-h32.png" alt=""></a> </div> </div> <div class="col-6"> @@ -366,7 +366,7 @@ publish_date: November 16, 2020 <p> We’ll leave outside of this article the details of the particular integration with HomeKit or Arduino devices. We’ll also defer to the NLPCraft <a href="/docs.html">documentation</a> to learn about other topics such as - <a href="/basic-concepts.html#stm">conversation management</a>, + <a href="/short-term-memory.html">conversation management</a>, details of <a href="/data-model.html#macros">Macro DSL</a> and <a href="/intent-matching.html">Intent Definition Language</a>, built-in <a href="/tools/test_framework.html">testing tools</a>, 3rd party NER <a href="/integrations.html">integrations</a>, etc. diff --git a/data-model.html b/data-model.html index 66a66c8..14cf1ec 100644 --- a/data-model.html +++ b/data-model.html @@ -472,6 +472,75 @@ intents: </div> </div> </section> + <section id="ne"> + <h2 class="section-sub-title">Named Entities <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + Named entity, also known as a model element or a token, is one of the main a components defined by the NLPCraft data model. + A named entity is one or more individual words that have a consistent semantic meaning and typically denote a + real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such + object can be abstract or have a physical existence. + </p> + <p> + For example, in the following sentence: + </p> + <p> + <i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for 12pm today in San Francisco.</code> + </p> + <p> + the following named entities can be detected: + </p> + <table class="gradient-table"> + <thead> + <tr> + <th>Words</th> + <th>Type</th> + <th>Normalized Value</th> + </tr> + </thead> + <tbody> + <tr> + <td><code>Meeting</code></td> + <td>CUSTOM_OBJ</td> + <td>meeting</td> + </tr> + <tr> + <td><code>set</code></td> + <td>CUSTOM_ACT</td> + <td>set</td> + </tr> + <tr> + <td><code>12pm today</code></td> + <td>DATE_TIME</td> + <td>12:00 September 1, 2019 GMT</td> + </tr> + <tr> + <td><code>San Francisco</code></td> + <td>GEO_CITY</td> + <td>San Francisco, CA USA</td> + </tr> + </tbody> + </table> + <p> + In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many + different notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>, + <code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form. + </p> + <p> + The process of detecting named entities is called Named Entity Recognition (NER). There are many + different ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using + statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based + named entities definition as well as the ability to compose compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL) + combining other named entities including named entities from external projects such OpenNLP, spaCy or Stanford CoreNLP. + </p> + <p> + Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic + abstractions like geographical location or time when you are trying to understand the meaning of the sentence. + One of the main goals of named entities is to act as an input ingredients for intent matching. + </p> + <p> + Read more in-depth about named entities <a href="data-model.html">here</a>. + </p> + </section> <section id="elements"> <h2 class="section-title">Model Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> <p>