This is an automated email from the ASF dual-hosted git repository.
aradzinski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2c980a5 Update configuration.html
2c980a5 is described below
commit 2c980a52ede24c7c84711e9c52491dfe43059041
Author: Aaron Radzinski <[email protected]>
AuthorDate: Sun May 9 11:11:57 2021 -0700
Update configuration.html
---
configuration.html | 407 -----------------------------------------------------
1 file changed, 407 deletions(-)
diff --git a/configuration.html b/configuration.html
index 2da2328..e10d819 100644
--- a/configuration.html
+++ b/configuration.html
@@ -25,289 +25,6 @@ id: configuration
<section>
<h2 class="section-title">Overview <a href="#"><i class="top-link fas
fa-fw fa-angle-double-up"></i></a></h2>
<p>
- REST server and data probe are the two runtime components that you
need to run when using NLPCraft.
- Data probes are used to deploy and host data model, while REST
server (or a cluster of servers) is used
- to accept client REST call and route them to the data models via
data probes.
- </p>
- <h2 class="section-sub-title">REST Server vs. Data Probe <a
href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- It's important to remember why REST server is a separate component
from a data probe. While a
- typical deployment would have only one REST server (or a cluster
of REST servers behind a single
- load balancer), there are maybe multiple data probes hosting
different data models deployed in
- different physical locations, managed through different life
cycles and requiring different
- security and network configurations.
- </p>
- <p>
- Moreover, REST server is a heavy and resource consuming component
that is built around Apache
- Ignite distributing in-memory computing capabilities - while the
data probe is a lightweight
- data model container. During the development and testing of data
models, the developers need to
- frequently redeploy data models by restarting the data probe. If
the REST server and the data probe
- would be one component - this process would be very inefficient.
- </p>
- <h2 class="section-sub-title">All-Inclusive JAR <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- <a href="/download.html#zip">Binary</a> NLPCraft ZIP download
comes with a single executable JAR file that includes all
- necessary dependencies:
<code>build/<b>apache-nlpcraft-incubating-{{site.latest_version}}-all-deps.jar</b></code>.
- This single all-inclusive JAR file can be used to start any
NLPCraft runtime components as standard
- Java applications and includes binary classes for:
- </p>
- <ul>
- <li><a href="#server">REST Server</a></li>
- <li><a href="#probe">Data Probe</a></li>
- <li><a href="/tools/script.html">NLPCraft CLI</a></li>
- <li><a href="/data-model.html">Model API</a></li>
- </ul>
- <p>
- Note that if you downloaded the <a
href="/download.html#src">source</a> ZIP you need to run <code
class="script">mvn clean package</code> to
- get the
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
- file. It will be located in <code>nlpcraft/target</code>
sub-folder.
- </p>
- </section>
- <section id="server">
- <h2 class="section-title">REST Server <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- REST server accepts client REST calls and routes them to the data
model hosted by data probes.
- REST server can be started in different ways:
- </p>
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab"
href="#nav-srv-script" role="tab">NLPCraft CLI</a>
- <a class="nav-item nav-link" data-toggle="tab"
href="#nav-srv-class" role="tab">Java Class <img src="/images/java2-h20.png"
alt=""></a>
- <a class="nav-item nav-link" data-toggle="tab"
href="#nav-srv-docker" role="tab"><i class="fab fa-docker"></i> Docker</a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="nav-srv-script"
role="tabpanel">
- <pre class="brush: bash">
- $ bin/nlpcraft.sh start-server
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- <a href="/tools/script.html">NLPCraft CLI</a> is
available as <code>nlpcraft.sh</code> for
- <i class="fab fa-fw fa-linux"></i> and
<code>nlpcraft.cmd</code>
- for <i class="fab fa-fw fa-windows"></i>.
- </li>
- <li>
- Run <code class="script">bin/nlpcraft.sh help
--cmd=start-server</code> to get a full help on this command.
- </li>
- </ul>
- </div>
- <div class="tab-pane fade show" id="nav-srv-class" role="tabpanel">
- <p></p>
- <p>
- If using executable JAR:
- </p>
- <pre class="brush: bash">
- $ java -Xms1024m -jar
apache-nlpcraft-incubating-{{site.latest_version}}-all-deps.jar -server
- </pre>
- <p>
- If specifying additional classpath components and need
<code>-cp</code> parameter:
- </p>
- <pre class="brush: bash">
- $ java -Xms1024m -cp
apache-nlpcraft-incubating-{{site.latest_version}}-all-deps.jar
org.apache.nlpcraft.NCStart -server
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- Make sure to provide correct path to
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
file.
- </li>
- <li>
- Class <code>org.apache.nlpcraft.NCStart</code> is a
common entry point for all NLPCraft runtime components.
- </li>
- <li>
- Class <code>org.apache.nlpcraft.NCStart</code> should
be used to star REST server from IDE.
- </li>
- </ul>
- <p>
- Parameters:
- </p>
- <dl>
- <dt>
- <code>-server</code>
- </dt>
- <dd>
- <em>Mandatory</em> parameter to indicate that you are
starting the REST server.
- </dd>
- <dt><code>-config=path</code></dt>
- <dd>
- <em>Optional</em> parameter to provide configuration
file path.
- Server will automatically look for
<code>nlpcraft.conf</code> configuration file in the same directory
- as
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
file. If the configuration
- file has different name or in different location use
<code>-config=path</code> parameter
- where <code>path</code> is an absolute path to the
configuration file. Note that the server and the data
- probe can use the same file for their configuration
(like the
- default <code>nlpcraft.conf</code> contains
configuration for both the server and the data probe).
- </dd>
- <dt><code>-igniteConfig=path</code></dt>
- <dd>
- <em>Optional</em> parameter to provide <a target=_
href="https://ignite.apache.org/">Apache Ignite</a> configuration file path.
- Note that Apache Ignite is used as a cluster computing
plane and a default distributed storage.
- Server will automatically look for
<code>ignite.xml</code>
- configuration file in the same directory as
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
file.
- If the configuration file has different name or in
different location use <code>-igniteConfig=path</code> parameter
- where <code>path</code> is an absolute path to the
Ignite configuration file.
- </dd>
- </dl>
- </div>
- <div class="tab-pane fade show" id="nav-srv-docker"
role="tabpanel">
- <p></p>
- <p>
- If Docker image is available for given version you can
start REST server as follows:
- </p>
- <pre class="brush: bash">
- $ docker run -m 8G -p 8081:8081 -p 8201:8201 -p 8202:8202
nlpcraftserver/server:{{site.latest_version}}
- </pre>
- </div>
- </div>
- <h2 class="section-sub-title">JVM Memory <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Make sure to allocate enough memory for server JVM using
<code>-Xms</code> JVM option, i.e. <code>-Xms1024m</code>.
- Many 3rd party NLP engines like Stanford CoreNLP are very memory
intensive and may require several GBs
- of JVM heap allocated depending on the models used. Note that when
server JVM has insufficient heap
- memory the Apache Ignite may throw the following warning logs:
- </p>
- <pre class="brush: text">
- Jul-22 13:27:56 [INFO ] ...
- Jul-22 13:28:08 [WARN ] Possible too long JVM pause: 11364
milliseconds.
- Jul-22 13:28:11 [INFO ] ...
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- <<a
href="/tools/script.html"><code>nlpcraft.{sh|cmd}</code></a> script
automatically uses
- <code>-Xms1024m</code> for <code>start-server</code> command.
- </li>
- </ul>
- <p>
- The abnormally long GC pauses (over 5s) can be caused by the
excessive memory swapping performed by OS due to
- insufficient JVM heap memory.
- </p>
- <h2 class="section-sub-title">Apache Ignite 2.x and JDK 11 <a
href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- NLPCraft REST server uses Apache Ignite 2.x as its distributed
in-memory computing plane. Apache Ignite
- <a target=_new
href="https://apacheignite.readme.io/docs/getting-started#running-ignite-with-java-11-and-later-versions">recommends</a>
- the following additional JVM options to be used when running
Apache Ignite 2.x on JDK 11:
- </p>
- <pre class="brush: text">
---add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
---add-opens=java.base/sun.nio.ch=ALL-UNNAMED
---add-opens=java.base/java.nio=ALL-UNNAMED
---add-opens=java.base/java.io=ALL-UNNAMED
---add-opens=java.base/java.util=ALL-UNNAMED
---add-opens=java.base/java.lang=ALL-UNNAMED
---add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
---add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
---add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED
---add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
---illegal-access=permit
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- <a
href="/tools/script.html"><code>nlpcraft.{sh|cmd}</code></a> script
automatically uses
- these options for <code>start-server</code> command.
- </li>
- </ul>
- </section>
- <section id="probe">
- <h2 class="section-title">Data Probe <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data probes are used to deploy and host data mode, and can also be
started in several ways:
- </p>
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab"
href="#nav-probe-script" role="tab" >NLPCraft CLI</a>
- <a class="nav-item nav-link" data-toggle="tab"
href="#nav-probe-class" role="tab" >Java Class <img src="/images/java2-h20.png"
alt=""></a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="nav-probe-script"
role="tabpanel">
- <pre class="brush: bash">
- $ bin/nlpcraft.sh start-probe
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- <a href="/tools/script.html">NLPCraft CLI</a> is
available as <code>nlpcraft.sh</code> for
- <i class="fab fa-fw fa-linux"></i> and
<code>nlpcraft.cmd</code>
- for <i class="fab fa-fw fa-windows"></i>.
- </li>
- <li>
- Run <code class="script">bin/nlpcraft.sh help
--cmd=start-probe</code> to get a full help on this command.
- </li>
- </ul>
- </div>
- <div class="tab-pane fade show" id="nav-probe-class"
role="tabpanel">
- <p></p>
- <p>
- If using executable JAR:
- </p>
- <pre class="brush: bash">
- $ java -jar
apache-nlpcraft-incubating-{{site.latest_version}}-all-deps.jar -probe
- </pre>
- <p>
- If specifying additional classpath components and need
<code>-cp</code> parameter:
- </p>
- <pre class="brush: bash">
- java -cp
apache-nlpcraft-incubating-{{site.latest_version}}-all-deps.jar:/my/project/classes
org.apache.nlpcraft.NCStart -probe -config=/my/project/probe.conf
- </pre>
- <p>
- <b>NOTES:</b>
- </p>
- <ul>
- <li>
- <code>/my/project</code> directory contains
user-defined model implementation
- </li>
- <li>
- Make sure to provide correct path to
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
file.
- </li>
- <li>
- Class <code>org.apache.nlpcraft.NCStart</code> is a
common entry point for all NLPCraft runtime components.
- </li>
- <li>
- Class <code>org.apache.nlpcraft.NCStart</code> should
be used to star data probe from IDE.
- </li>
- </ul>
- <p>
- Parameters:
- </p>
- <dl>
- <dt>
- <code>-probe</code>
- </dt>
- <dd>
- <em>Mandatory</em> parameter to indicate that you are
starting a data probe.
- </dd>
- <dt><code>-config=path</code></dt>
- <dd>
- <p>
- <em>Optional</em> parameter to provide probe
configuration file path.
- Data probe will automatically look for
<code>nlpcraft.conf</code> configuration file in the same directory
- as
<code>apache-nlpcraft-incubating-<b>{{site.latest_version}}</b>-all-deps.jar</code>
file. If the configuration
- file has different name or in different location
use <code>-config=path</code> parameter
- where <code>path</code> is an absolute path to the
data probe configuration file. Note that the server and the data
- probe can use the same file for their
configuration (like the
- default <code>nlpcraft.conf</code> contains
configuration for both the server and the data probe).
- </p>
- </dd>
- </dl>
- </div>
- </div>
- </section>
- <section id="config">
- <h2 class="section-title">Configuration <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
Both REST server and the data probe use <a target=_
href="https://github.com/lightbend/config/">Typesafe Config</a> for their
configuration:
</p>
<ul>
@@ -428,134 +145,10 @@ nlpcraft {
property to JVM process:
<code>-D<b>NLPCRAFT_ANSI_COLOR_DISABLED</b>=true</code>
</p>
</section>
- <section id="testing">
- <h2 class="section-title">CI Testing <a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- It is a good practice to run units tests during routine builds
using Maven (or others CI toolchains). To
- test data models you need to have a running server and then start
one or more data probes with
- models you want to test. While doing this from IDE can be trivial
enough, doing this from Maven
- can be tricky.
- </p>
- <p>
- The challenge is that from the Maven build you need to start the
server, wait til its fully started and
- initialized, and only then start issuing REST calls, start data
probes or run tests that use <a href="/tools/embedded_probe.html">embedded
probes</a>.
- When done manually (e.g. from IDE) you can visually observe when
the server finished its startup and then manually
- launch the tests. In Maven, however, you need to use a special
plugin to accomplish the same in
- automated fashion.
- </p>
- <div class="bq info">
- <b>Probe Waits For Server</b>
- <p>
- Technically, when a data probe starts up it will initialize,
load the models, and will automatically wait for the server to get online
- if it isn't yet (as well as periodically check for it). Once
server is online the data probe will automatically connect to it. However,
- if the unit tests don't use data probe and only issue REST
calls then these tests have to somehow wait for the
- server to get online.
- </p>
- <p>
- To overcome this challenge you can use
- <a target="github"
href="https://github.com/bazaarvoice/maven-process-plugin"><code>process-exec-maven-plugin</code></a>
- Maven plugin.
- </p>
- </div>
- <p>
- To get around this problem NLPCraft uses <a target="github"
href="https://github.com/bazaarvoice/maven-process-plugin"><code>process-exec-maven-plugin</code></a>
- Maven plugin in its own build. This plugin allows to start the
external process and use configured URL
- endpoint to check whether or not the external process has fully
started. This works perfect with NLPCraft server
- <a href="/using-rest.html">health check REST call</a>. The plugin
can be configured in the following way for your own project
- (taken directly from NLPCraft <a target="github"
href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft/pom.xml"><code>pom.xml</code></a>):
- </p>
- <pre class="brush: xml, highlight: [14, 16, 32]">
-<plugin>
- <groupId>com.bazaarvoice.maven.plugins</groupId>
- <artifactId>process-exec-maven-plugin</artifactId>
- <version>0.9</version>
- <executions>
- <execution>
- <id>pre-integration-test</id>
- <phase>pre-integration-test</phase>
- <goals>
- <goal>start</goal>
- </goals>
- <configuration>
- <name>server</name>
-
<healthcheckUrl>http://localhost:8081/api/v1/health</healthcheckUrl>
- <waitAfterLaunch>180</waitAfterLaunch>
-
<processLogFile>${project.build.directory}/server.log</processLogFile>
- <arguments>
- <argument>java</argument>
- <argument>-Xmx4G</argument>
- <argument>-Xms4G</argument>
-
<argument>--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/sun.nio.ch=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/java.nio=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/java.io=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/java.util=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/java.lang=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED</argument>
-
<argument>--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED</argument>
-
<argument>--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED</argument>
-
<argument>--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED</argument>
- <argument>--illegal-access=permit</argument>
-
<argument>-DNLPCRAFT_ANSI_COLOR_DISABLED=true</argument>
-
<argument>-Djdk.tls.client.protocols=TLSv1.2</argument>
- <argument>-jar</argument>
-
<argument>${project.build.directory}/${nlpcraft.all.deps.jar}</argument>
- <argument>-server</argument>
- </arguments>
- </configuration>
- </execution>
- <execution>
- <id>stop-all</id>
- <phase>post-integration-test</phase>
- <goals>
- <goal>stop-all</goal>
- </goals>
- </execution>
- </executions>
-</plugin>
- </pre>
- <p>
- <b>NOTES</b>:
- </p>
- <ul>
- <li>
- On line 14 we specify the URL endpoint to check whether or not
our server is online. We use
- <code>/health</code> localhost REST call for that.
- </li>
- <li>
- On line 16 we redirect the output from server to a dedicated
file to <b>avoid interleaving</b> log
- from server and log from data probe in the same console (where
we are running the Maven build from).
- Such interleaving will make the combined log unreadable and
can cause output problem for the console
- due to mixed up ANSI escape sequences from server and data
probe.
- </li>
- <li>
- Since the server log is piped out to a separate file we
disable ANSI coloring for the server
- on the line 32.
- </li>
- </ul>
- <div class="bq info">
- <b>Avoid Interleaving Logs</b>
- <p>
- When running both server and the data probe(s) from the Maven
build it is important to avoid interleaving
- logs from the server and the probe. Such interleaving will
make the combined log in Maven unreadable and
- can cause console malfunction due to mixed up ANSI escape
codes. It is idiomatic in such cases to:
- </p>
- <ul>
- <li>
- Disable ANSI coloring for the server via
<code>-D<b>NLPCRAFT_ANSI_COLOR_DISABLED</b>=true</code>
- </li>
- <li>
- Pipe out the server log into a dedicated file using Maven
plugins like
- <a target="github"
href="https://github.com/bazaarvoice/maven-process-plugin"><code>process-exec-maven-plugin</code></a>.
- </li>
- </ul>
- </div>
- </section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
- <li><a href="#config">Configuration</a></li>
<li><a href="#ansi">ANSI Colors</a></li>
{% include quick-links.html %}
</ul>