This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-513
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/NLPCRAFT-513 by this push:
new ea2a765 WIP.
ea2a765 is described below
commit ea2a7657ef2a4b43dbda92ea9da40c979a8205c4
Author: skhdl <[email protected]>
AuthorDate: Fri Oct 14 18:29:27 2022 +0400
WIP.
---
docs.html | 174 ++++++++++++++++++++++++++++++--------------------------------
1 file changed, 85 insertions(+), 89 deletions(-)
diff --git a/docs.html b/docs.html
index 78e2e38..7100234 100644
--- a/docs.html
+++ b/docs.html
@@ -25,12 +25,10 @@ id: overview
<section id="overview">
<h2 class="section-title">Overview <a href="#"><i class="top-link fas
fa-fw fa-angle-double-up"></i></a></h2>
<p>
- Apache NLPCraft is a JVM-based <a target=_blank
href="https://www.apache.org/licenses/">open source</a> library
- for adding a natural language interface to modern applications.
It enables people to interact with your products using voice or text. NLPCraft
can connect with
- any private or public data source, and has no hardware or software
lock-ins. Its design is based on advanced
- <a href="/intent-matching.html">Intent Definition Language</a>
(IDL) for defining non-trivial intents and a fully deterministic intent matching
- algorithm for the input utterances. You can build intents for
NLPCraft using any JVM-based languages like Java, Scala, Kotlin, Groovy, etc.
NLPCraft
- exposes REST APIs for integration with end-user applications.
+ Apache NLPCraft is an <a target=_blank
href="https://www.apache.org/licenses/">open source</a> Scala library for
adding a natural language interface to modern applications.
+ It enables people to interact with your products using voice or
text.
+ Its design is based on advanced <a
href="/intent-matching.html">Intent Definition Language</a> (IDL) for defining
non-trivial intents and
+ a fully deterministic intent matching algorithm for the input
utterances.
</p>
<p>
One of the key features of NLPCraft is its use of <a
href="/intent-matching.html">IDL</a> coupled with deterministic intent matching
that are tailor made for
@@ -38,107 +36,105 @@ id: overview
approach with time consuming corpora development and model
training - resulting in much a
<em>simpler <span class="amp">&</span> faster</em> implementation.
</p>
+
<p>
- Another key aspect of NLPCraft is its initial focus on processing
English language. Although it may sound
- counterintuitive, this narrower initial focus enables NLPCraft to
deliver unprecedented ease of use combined with
- unparalleled comprehension capabilities for English input
out-of-the-box. It avoids academic, watered down functionality or overly
- complicated configuration and usage - following on project's
<em>"built for engineers by engineers"</em> ethos.
- English language is spoken by more
- than a billion people on this planet and is de facto standard
global language of the business and commerce.
- </p>
- <p>
- So, how does it work in a nutshell?
- </p>
- <p>
- When using NLPCraft you will be dealing with three main components:
+ NlpCraft library contains two base elements: <code>Model</code>
and <code>Client</code>.
</p>
+
<ul>
- <li><a href="#data-model">Data model</a></li>
- <li><a href="#data-probe">Data probe</a></li>
- <li><a href="#server">REST Server</a></li>
+ <li>
+ <code>Model</code> is domain specific object which responsible
for user input interpretation. Model contains intents, defined via NlpCraft IDL
with related code callbacks. Intent is user defined callback and rule,
according to which this callback should be called. Rule is most often some
template, based on expected set of entities in user input, but it can be more
flexible.
+ </li>
+
+ <li>
+ <code>Client</code> is object, which allows to communicate
with given model. Main methods are user input processing and control of
communication session.
+ </li>
</ul>
- <figure>
- <img class="img-fluid" src="/images/homepage-fig1.1.png" alt="">
- <figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption>
- </figure>
- </section>
- <section id="data-model">
- <h2 class="section-title">Data Model <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- NLPCraft employs a <em>model-as-a-code</em> approach where
everything you do in NLPCraft is part of your source code. Data model is simply
an implementation of
- <a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Java
interface that
- can be developed using any JVM programming language like Java,
Scala, Kotlin or Groovy.
- Data model defines named entities, various configuration
properties as well as intents to interpret user input. Model-as-a-code natively
supports
- any software lifecycle tools and frameworks in Java ecosystem.
- </p>
- <p>
- Declarative portion of the model can be stored in a separate JSON
or YAML file
- for simpler maintenance. There are no practical limitation on how
complex or simple a model
- can be, or what other tools it can use. Data models use <a
href="/intent-matching.html">intents</a> to match the user input.
- </p>
- <p>
- To use data model it has to be deployed into a data probe.
- </p>
- </section>
- <section id="data-probe">
- <h2 class="section-title">Data Probe <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data probe is a light-weight container designed to securely deploy
and manage user data models.
- Each probe can deploy and manage multiple models and many probes
can be connected to the REST server (or a cluster of REST servers).
- The main purpose of the data probe is to separate data model
hosting from managing REST calls from the clients.
- While you would typically have just one REST server, you may have
multiple data probes deployed
- in different geo-locations and configured differently.
- </p>
- <p>
- Data probes can be deployed and run anywhere as long as there is
an ingress connectivity from the REST server, and are
- typically deployed in DMZ or close to your target data sources:
on-premise, in the cloud, etc. Data
- probe uses strong 256-bit encryption and ingress only connectivity
for communicating with the REST server.
- </p>
- </section>
- <section id="server">
- <h2 class="section-title">REST Server <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>Typical part of code:</p>
+
+ <pre class="brush: scala, highlight: []">
+ // Prepares domain model.
+ val mdl = new CustomNlpModel()
+
+ // Prepares client for given model.
+ val client = new NCModelClient(mdl)
+
+ // Sends text request to model by user ID "userId".
+ val result = client.ask("Some user command", "userId")
+
+ // Clears dialog session for user with ID "userId".
+ client.clearDialog("userId")
+ </pre>
+
<p>
- REST server (or a cluster of REST servers behind a load balancer)
provides URL endpoint for end-user applications
- to securely query data sources using natural language via data
models deployed in data probes. Its main purpose is to
- accept REST-over-HTTP calls from end-user applications and route
these requests to and from requested data probes.
+ Model definition includes two parts:
</p>
+ <ul>
+ <li>
+ <code>Configuration</code>. Static configuration parameters
including name, version, etc.
+ </li>
+ <li>
+ <code>Pipeline</code>. Most important component, which defines
user input processing chain.
+ <code>Pipeline</code> can be based on standard and custom user
defined components.
+ </li>
+ </ul>
+
<p>
- Unlike data probe that gets restarted every time the model is
changed, i.e. during development, the
- REST server is a "fire-and-forget" component that can be launched
once while various data probes can
- continuously reconnect to it. It can typically run as a Docker
image locally on premise or in the cloud.
+ Before looking at pipeline elements more throughly, let's start
with terminology.
</p>
+
+ <ul>
+ <li>
+ <code>Token</code>. It is simple string, part of user input,
which split according to some rules, for instance by spaces and some additional
conditions, which depends on language and some expectations.
+ So user input "<b>Where is it?</b>" contains four tokens:
"<b>Where</b>", "<b>is</b>", "<b>it</b>", "<b>?</b>".
+ </li>
+ <li>
+ <code>Entity</code>. According to wikipedia, named entity is a
real-world object, such as a person, location, organization, product, etc.,
that can be denoted with a proper name. It can be abstract or have a physical
existence. Each entity can contain one or more tokens.
+ </li>
+ <li>
+ <code>Variant</code>. List of entities. Potentially, each
token can be recognized as different entities, so user input can be processed
as set of variants. For example user input "Mercedes" can be processed as 2
variants, both of them contains single element list of entities: car brand or
Spanish family name.
+ </li>
+ </ul>
+
<p>
- Learn more about <a href="data-model.html">data model</a>,
- <a href="server-and-probe.html#probe">data probe</a> and <a
href="server-and-probe.html#server">REST server</a>.
+ Back to pipeline. Pipeline should be created based in following
components:
</p>
- </section>
- <section id="in-depth">
- <h2 class="section-title">In-Depth Look <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
+ <ul>
+ <li>
+ <code>Token parser</code>. Mandatory NLP component, it is
required for parsing plain text, user input, and split this text into tokens
list. NlpCraft provides default EN implementation of token parser. Also,
project contain various examples for FR and RU languages.
+ </li>
+ <li>
+ <code>Tokens enrichers</code> optional list. Tokens enricher
is component which allows to add additional properties to prepared tokens, like
part of speech, quote, stop-words flags or any other. NlpCraft provides default
set of EN tokens enrichers implementations.
+ </li>
+ <li>
+ <code>Tokens validators</code> optional list. Tokens validator
is user defined component, where tokens are inspected and exception can be
thrown from user code to break user input processing.
+ </li>
+ <li>
+ <code>Entity parsers</code> mandatory list. At least one
entity parser must be defined. Having prepared tokens as input, each entity
parser tries to find user defined named entities. NlpCraft provides wrappers
for named-entity recognition components of OpenNLP and Stanford libraries.
+ </li>
+ <li>
+ <code>Entity enrichers</code> optional list. Entity enricher
is component which allows to add additional properties to prepared entities.
Can be useful for extending existing entity enrichers functionality.
+ </li>
+ <li>
+ <code>Entity mappers</code> optional list. Entity mapper is
component which allows to map one set of entities into another after the
entities were parsed and enriched. Can be useful for building complex parsers
based on existed.
+ </li>
+ <li>
+ <code>Entity validators</code> optional list. Entities
validator is user defined component, where prepared entities are inspected and
exceptions can be thrown from user code to break user input processing.
+ </li>
+ <li>
+ <code>Variant filter</code>. Optional component which allows
filtering detected variants, rejecting undesirable.
+ </li>
+ </ul>
<p>
- Watch this full video (34:42) of the presentation from
- <a target=_ href="https://www.apachecon.com/acasia2021/">ApacheCon
Asia 2021</a> conference to get in-depth understanding of
- the reasons why NLPCraft project was developed and what are the
key principles that underlying it:
+ This flexible system allows to create any pipelines on any
language. You can collect NlpCraft predefined components, write your own and
easy reuse custom components.
</p>
- <div>
- <iframe
- width="514"
- height="289"
-
src="https://www.youtube.com/embed/O7iK0AXvcJ8?modestbranding=1"
- title="NLPCraft - Breaking Years Of Dogma In NLP"
- frameborder="0"
- allow="accelerometer; autoplay; clipboard-write;
encrypted-media; gyroscope; picture-in-picture"
- allowfullscreen>
- </iframe>
- </div>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
- <li><a href="#data-model">Data Model</a></li>
- <li><a href="#data-probe">Data Probe</a></li>
- <li><a href="#server">REST Server</a></li>
{% include quick-links.html %}
</ul>
</div>