This is an automated email from the ASF dual-hosted git repository.
hansva pushed a commit to branch release/2.11.0
in repository https://gitbox.apache.org/repos/asf/hop.git
The following commit(s) were added to refs/heads/release/2.11.0 by this push:
new f66cb5fbc0 [DOCS] cherry pick language model chat documentation
f66cb5fbc0 is described below
commit f66cb5fbc04020336307316789c7a29f34b337e6
Author: Hans Van Akelyen <[email protected]>
AuthorDate: Tue Dec 10 15:53:44 2024 +0100
[DOCS] cherry pick language model chat documentation
---
.../pipeline/transforms/languagemodelchat.adoc | 206 ++++++++++++++++++++-
1 file changed, 196 insertions(+), 10 deletions(-)
diff --git
a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
index ead36dc1e3..a6b49ce7be 100644
---
a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
+++
b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
@@ -20,20 +20,206 @@ under the License.
= image:transforms/icons/languagemodelchat.svg[Language Model Chat transform
Icon, role="image-doc-icon"] Language Model Chat
-[%noheader,cols="3a,1a", role="table-no-borders" ]
+[%noheader,cols="3a,1a",role="table-no-borders" ]
|===
|
== Description
-The Language Model Chat transforms Allows you to interact with various
language models such as OpenAI, Anthropic, Hugging Face, Mistral and Ollama
+The Language Model Chat transform, using LangChain4j, enables seamless
interaction with a variety of language models endpoints, including OpenAI,
Anthropic, Hugging Face, Mistral, and Ollama. This transform provides a unified
interface, allowing to integrate and configure different models effortlessly
within the same pipeline.
-|
-== Supported Engines
-[%noheader,cols="2,1a",frame=none, role="table-supported-engines"]
+Additionally, the transform supports the application of some prompt
engineering techniques to refine model interactions and achieve more
contextually relevant responses. See ``Basic Prompt Engineering Techniques``
below for some examples.
+
+| == Supported Engines
+
+[%noheader,cols="2,1a",frame=none,role="table-supported-engines"]
!===
-!Hop Engine! image:check_mark.svg[Supported, 24]
-!Spark! image:question_mark.svg[Maybe Supported, 24]
-!Flink! image:question_mark.svg[Maybe Supported, 24]
-!Dataflow! image:question_mark.svg[Maybe Supported, 24]
+!Hop Engine! image:check_mark.svg[Supported,24]
+!Spark! image:question_mark.svg[Maybe Supported,24]
+!Flink! image:question_mark.svg[Maybe Supported,24]
+!Dataflow! image:question_mark.svg[Maybe Supported,24]
!===
-|===
\ No newline at end of file
+
+|===
+
+== OPTIONS
+
+=== Common Parameters
+
+The following parameters are applicable to all language model selections.
+
+[cols="2,4",options="header"]
+|===
+| Option | Description
+
+| Transform Name
+| Name of the transform. Must be unique within a single pipeline.
+
+| Input JSON Chat
+| When enabled, uses the chat completion JSON format (system, assistant, user
messages). When disabled, treats the input as plain text. See "Chat Completion"
section for more details.
+
+| Output JSON Chat
+| When enabled, returns the response in chat completion JSON format. When
disabled, returns the response as plain text. See "Chat Completion" section
for more details.
+
+| Input field name
+| The name of the field holding the input text/prompt
+
+| Output field name prefix
+| A prefix applied to all output field names.
+
+| Identifier value (optional)
+| An optional identifier to label the output records; if left blank, the
output will default to the model's name.
+
+| Parallelism
+| The number of parallel threads used for API requests. **WARNING: Be mindful
of endpoint capacity and associated costs.** It is advisable to enable `Mock`
mode or set parallelism to 1 during testing and development. Once testing is
complete, gradually increase the parallelism to assess consumption costs and
the provider's capacity to manage multiple threads, including rate limits and
other constraints.
+
+| Mock (simulate)
+| Enables a test mode that simulates responses without performing real API
calls.
+
+| Mock Output Value
+| A predefined response returned in simulation mode, used when real API calls
are skipped.
+
+| Model API
+| Defines the API used for communication with the endpoint (e.g., via OpenAI,
Anthropic, Ollama, Hugging Face, Mistral). **Note:** The Open AI API option can
be used with any Open AI compatible endpoint/server. For example, vLLM (Kwon
et al., 2023).
+|===
+
+=== Model-Specific Options
+
+Below is a table detailing parameters specific to individual language model
APIs that are currently supported with the Language Model Chat transform plugin.
+
+[cols="2,1,1,1,1,1", options="header"]
+|===
+|Option |OpenAI |Anthropic |Hugging Face |Mistral |Ollama
+|API/Endpoint URL | ✅ | ✅ | ✅ | ✅ | ✅
+|Temperature | ✅ | ✅ | ✅ | ✅ | ✅
+|Max New Tokens | ✅ | ✅ | ✅ | ✅ | ✅
+|Timeout | ✅ | ✅ | ✅ | ✅ | ✅
+|API Key/AccessToken | ✅ | ✅ | ✅ | ✅ | ❌
+|Model Name | ✅ | ✅ | ❌ | ✅ | ✅
+|Top P | ✅ | ✅ | ❌ | ✅ | ✅
+|Max Retries | ✅ | ✅ | ❌ | ✅ | ✅
+|Response Format | ✅ | ❌ | ❌ | ✅ | ✅
+|Seed | ✅ | ❌ | ❌ | ✅ | ✅
+|Log Requests | ✅ | ✅ | ❌ | ✅ | ❌
+|Log Responses | ✅ | ✅ | ❌ | ✅ | ❌
+|Top K | ❌ | ✅ | ❌ | ❌ | ✅
+|Presence/Repeat Penalty | ✅ | ❌ | ❌ | ❌ | ✅
+|Organisation | ✅ | ❌ | ❌ | ❌ | ❌
+|Frequency Penalty | ✅ | ❌ | ❌ | ❌ | ❌
+|Safe Prompt | ❌ | ❌ | ❌ | ✅ | ❌
+|User | ✅ | ❌ | ❌ | ❌ | ❌
+|Return Full Text | ❌ | ❌ | ✅ | ❌ | ❌
+|Wait for Model | ❌ | ❌ | ✅ | ❌ | ❌
+|Use Proxy | ✅ | ❌ | ❌ | ❌ | ❌
+|Context Window Size | ❌ | ❌ | ❌ | ❌ | ✅
+|===
+
+
+[cols="2,5", options="header"]
+|===
+|Option |Description
+
+|API/Endpoint URL
+|Specifies the identifier (e.g., URL, model ID, or image ID) used to access
and interact with models provided by services like OpenAI, Anthropic, Mistral,
Ollama, or Hugging Face.
+
+|Temperature
+|Controls the randomness of the output: high values lead to more diverse text
(creative/inventive), while low values produce more focused and deterministic
responses. The value typically ranges between 0 and 1 but can exceed this range
depending on the model. Note: Unless intentional, it is generally best to
adjust only one parameter; either temperature or Top-P/K, rather than multiple
simultaneously.
+
+|Max New Tokens
+|Sets a limit on how many tokens the model may generate in one response.
+
+|Context Window Size
+|Specifies how much input tokens the model can consider at once.
+
+|Timeout
+|Determines how long to wait for a response before giving up.
+
+|API Key/AccessToken
+| Unique access value used to authenticate and authorise access to the
endpoint service.
+
+|Model Name
+|Identifies which specific model to use.
+
+|Top P
+|Also known as nucleus sampling (Holtzman et al., 2019), controls the
diversity of output by limiting the model's choices to a dynamic subset of the
vocabulary. For example, 0.2 means only the tokens comprising the top 20%
probability mass are considered. Top-p and temperature both influence the
randomness and diversity of the model's output but operate differently.
Temperature scales the probabilities of all tokens while top-p limits token
selection to a subset of options whose cumulat [...]
+
+|Max Retries
+|The maximum number of retry attempts for failed API requests.
+
+|Response Format
+|Defines the structure used for the model's responses. Note: While some APIs
support this option, not all models are compatible with it. Additionally, the
API may require explicitly instructing the model to produce JSON to ensure
proper functionality.
+
+|Seed
+|Attempts to sample deterministically, ensuring consistent results for
repeated requests with the same seed, prompt, and parameters.
+
+|Top K
+|Controls the breadth of possible answers by limiting the number of tokens
considered. Unlike Top P, which selects tokens until their cumulative
probability meets or exceeds __**p**__, this parameter restricts the selection
to a fixed number of tokens defined by __**k**__. Note: Unless intentional, it
is generally best to adjust only one parameter; either temperature or Top-P/K,
rather than multiple simultaneously.
+
+|Repeat/Frequency Penalty
+|This setting adjusts how the model evaluates token frequency in the generated
text, encouraging the model to use less/more repetitive language. The penalty
can range from -2.0 to 2.0, depending on the endpoint model. Positive values
discourage repetition by penalising frequently used tokens, while negative
values can increase repetition by favouring them. Reasonable values for the
frequency penalty are typically between 0.1 and 1.0.
+
+|Presence Penalty
+|This setting determines how the model evaluates new tokens, aiming to
discourage reuse of previously generated tokens. The penalty ranges from -2.0
to 2.0, depending on the endpoint model. Positive values penalise tokens that
have already appeared in the text, encouraging the model to explore new
directions. Unlike the frequency penalty, which scales proportionally based on
how often a token is used, the presence penalty applies a one-time additive
penalty to tokens that have appeared a [...]
+
+|Organisation
+|The organisation ID associated with the API key, for account and billing
purposes.
+
+|Safe Prompt
+|Provides a baseline prompt to guide the model towards generating safe
outputs. For example, by prepending your messages with a system prompt such
as: ``__Always assist with care, respect, and truth. Respond with utmost
utility yet securely. Avoid harmful, unethical, prejudiced, or negative
content. Ensure replies promote fairness and positivity__``.
+
+|User
+|A unique identifier for the end-user, used by the endpoint provider to
monitor activity and detect potential abuse.
+
+|Return Full Text
+|Ensures the entire generated text is returned without truncation.
+
+|Wait for Model
+|Delays response until the model is ready. If the model is cold and needs
loading, this avoids repeated requests by waiting for it to become available.
+
+|Use Proxy
+|Routes requests through a proxy server.
+
+|Log Requests
+|Used for debugging requests.
+
+|Log Responses
+|Used for debugging responses.
+|===
+
+== ADDITIONAL INFORMATION
+
+=== Token
+A token represents a unit of text, such as a word, subword, or character,
depending on the tokenisation method used during model development. The size of
a token varies across models, but a common estimate for English is
approximately 4 tokens per 3 words. For more precise calculations, it is
important to consider the specific tokenisation method employed by the model,
such as Byte Pair Encoding (BPE) or WordPiece. There are online tools that
provide convenient calculators to determine t [...]
+
+=== Chat Completion / Prompting
+Chat completion refers to the JSON structure used to design and control the
flow of interactions in a conversational AI system (the model) via an API. This
structure defines role-based communication (e.g., "system," "user,"
"assistant") and shapes the assistant's behaviour, enabling it to capture user
intent, extend text in response to prompts, and generate relevant content even
for tasks the model hasn’t been specifically fine-tuned for.
+
+Other terms related to this concept include prompting, prompt engineering,
in-context learning and meta-learning. While these terms might sometimes be
used interchangeably, they have distinct technical meanings. Meta-learning
refers to the broader framework where a model leverages an
inner-loop/outer-loop structure to adapt to new tasks. Within this framework,
in-context learning represents the inner loop, where the model performs tasks
based on examples provided during inference without [...]
+
+Example:
+[source,json]
+----
+[
+{
+ "role" : "system",
+ "content" : "Provides instructions, context, or guidelines to shape the
assistant's behaviour and set the overall tone or purpose of the interaction."
+}, {
+ "role" : "user",
+ "content" : "Represents the individual interacting with the system,
providing queries, requests, or instructions to guide the assistant’s
responses."
+}, {
+ "role" : "assistant",
+ "content" : "Responds to the user’s input by generating outputs based on the
system's instructions and the user's messages."
+}
+]
+----
+
+=== Basic Prompt Engineering Techniques
+
+- *Zero-Shot:* The model generates output based solely on the given prompt and
its pre-existing knowledge, with no examples provided to guide its response
(Brown et al., 2020).
+
+- *One/Few-Shot:* A small set of examples (one or more) are provided within
the prompt to help the model generalise to new, unseen inputs (Brown et al.,
2020).
+
+- *Chain-of-Thought:* The model is guided through a logical, step-by-step
reasoning process, enabling it to break down complex problems and improve its
ability to reach more accurate conclusions (Wei et al., 2022). This method is
particularly effective for multistep reasoning tasks. Variants include
Zero-Shot and Few-Shot Chain-of-Thought (Kojima et al., 2023).
+
+- *Self-Consistency:* The model generates multiple diverse reasoning paths for
the same prompt by introducing variability through techniques like adjusting
the temperature or applying nucleus sampling (top-p). The final answer is
determined by selecting the response that occurs most frequently among these
outputs (Wang et al., 2023). This method is particularly effective for
improving accuracy in reasoning tasks by leveraging diversity in outputs. For
example, in a classification task, t [...]
+
+- *Prompt-Chaining (Sequential Chaining):* A method where multiple related
prompts are provided in sequence, with each prompt building on the output of
the previous one. This step-by-step approach helps guide the model’s reasoning
process, enabling it to tackle complex tasks incrementally. This technique
requires the JSON chat feature to be enabled, as it relies on this
functionality to chain the conversation.
\ No newline at end of file