(hop) branch release/2.11.0 updated: [DOCS] cherry pick language model chat documentation

hansva Tue, 10 Dec 2024 06:54:02 -0800

This is an automated email from the ASF dual-hosted git repository.

hansva pushed a commit to branch release/2.11.0
in repository https://gitbox.apache.org/repos/asf/hop.git



The following commit(s) were added to refs/heads/release/2.11.0 by this push:
     new f66cb5fbc0 [DOCS] cherry pick language model chat documentation
f66cb5fbc0 is described below

commit f66cb5fbc04020336307316789c7a29f34b337e6
Author: Hans Van Akelyen <[email protected]>
AuthorDate: Tue Dec 10 15:53:44 2024 +0100

    [DOCS] cherry pick language model chat documentation
---
 .../pipeline/transforms/languagemodelchat.adoc     | 206 ++++++++++++++++++++-
 1 file changed, 196 insertions(+), 10 deletions(-)

diff --git 
a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
 
b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
index ead36dc1e3..a6b49ce7be 100644
--- 
a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
+++ 
b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/languagemodelchat.adoc
@@ -20,20 +20,206 @@ under the License.
 
 = image:transforms/icons/languagemodelchat.svg[Language Model Chat transform 
Icon, role="image-doc-icon"] Language Model Chat
 
-[%noheader,cols="3a,1a", role="table-no-borders" ]
+[%noheader,cols="3a,1a",role="table-no-borders" ]
 |===
 |
 == Description
 
-The Language Model Chat transforms Allows you to interact with various 
language models such as OpenAI, Anthropic, Hugging Face, Mistral and Ollama
+The Language Model Chat transform, using LangChain4j, enables seamless 
interaction with a variety of language models endpoints, including OpenAI, 
Anthropic, Hugging Face, Mistral, and Ollama. This transform provides a unified 
interface, allowing to integrate and configure different models effortlessly 
within the same pipeline.
 
-|
-== Supported Engines
-[%noheader,cols="2,1a",frame=none, role="table-supported-engines"]
+Additionally, the transform supports the application of some prompt 
engineering techniques to refine model interactions and achieve more 
contextually relevant responses. See ``Basic Prompt Engineering Techniques`` 
below for some examples.
+
+| == Supported Engines
+
+[%noheader,cols="2,1a",frame=none,role="table-supported-engines"]
 !===
-!Hop Engine! image:check_mark.svg[Supported, 24]
-!Spark! image:question_mark.svg[Maybe Supported, 24]
-!Flink! image:question_mark.svg[Maybe Supported, 24]
-!Dataflow! image:question_mark.svg[Maybe Supported, 24]
+!Hop Engine! image:check_mark.svg[Supported,24]
+!Spark! image:question_mark.svg[Maybe Supported,24]
+!Flink! image:question_mark.svg[Maybe Supported,24]
+!Dataflow! image:question_mark.svg[Maybe Supported,24]
 !===
-|===
\ No newline at end of file
+
+|===
+
+== OPTIONS
+
+=== Common Parameters
+
+The following parameters are applicable to all language model selections.
+
+[cols="2,4",options="header"]
+|===
+| Option | Description
+
+| Transform Name
+| Name of the transform. Must be unique within a single pipeline.
+
+| Input JSON Chat
+| When enabled, uses the chat completion JSON format (system, assistant, user 
messages). When disabled, treats the input as plain text. See "Chat Completion" 
section for more details.
+
+| Output JSON Chat
+| When enabled, returns the response in chat completion JSON format. When 
disabled, returns the response as plain text.  See "Chat Completion" section 
for more details.
+
+| Input field name
+| The name of the field holding the input text/prompt
+
+| Output field name prefix
+| A prefix applied to all output field names.
+
+| Identifier value (optional)
+| An optional identifier to label the output records; if left blank, the 
output will default to the model's name.
+
+| Parallelism
+| The number of parallel threads used for API requests. **WARNING: Be mindful 
of endpoint capacity and associated costs.** It is advisable to enable `Mock` 
mode or set parallelism to 1 during testing and development. Once testing is 
complete, gradually increase the parallelism to assess consumption costs and 
the provider's capacity to manage multiple threads, including rate limits and 
other constraints.
+
+| Mock (simulate)
+| Enables a test mode that simulates responses without performing real API 
calls.
+
+| Mock Output Value
+| A predefined response returned in simulation mode, used when real API calls 
are skipped.
+
+| Model API
+| Defines the API used for communication with the endpoint (e.g., via OpenAI, 
Anthropic, Ollama, Hugging Face, Mistral). **Note:** The Open AI API option can 
be used with any Open AI compatible endpoint/server.  For example, vLLM (Kwon 
et al., 2023).
+|===
+
+=== Model-Specific Options
+
+Below is a table detailing parameters specific to individual language model 
APIs that are currently supported with the Language Model Chat transform plugin.
+
+[cols="2,1,1,1,1,1", options="header"]
+|===
+|Option |OpenAI |Anthropic |Hugging Face |Mistral |Ollama
+|API/Endpoint URL | ✅ | ✅ | ✅ | ✅ | ✅
+|Temperature | ✅ | ✅ | ✅ | ✅ | ✅
+|Max New Tokens | ✅ | ✅ | ✅ | ✅ | ✅
+|Timeout | ✅ | ✅ | ✅ | ✅ | ✅
+|API Key/AccessToken | ✅ | ✅ | ✅ | ✅ | ❌
+|Model Name | ✅ | ✅ | ❌ | ✅ | ✅
+|Top P | ✅ | ✅ | ❌ | ✅ | ✅
+|Max Retries | ✅ | ✅ | ❌ | ✅ | ✅
+|Response Format | ✅ | ❌ | ❌ | ✅ | ✅
+|Seed | ✅ | ❌ | ❌ | ✅ | ✅
+|Log Requests | ✅ | ✅ | ❌ | ✅ | ❌
+|Log Responses | ✅ | ✅ | ❌ | ✅ | ❌
+|Top K | ❌ | ✅ | ❌ | ❌ | ✅
+|Presence/Repeat Penalty | ✅ | ❌ | ❌ | ❌ | ✅
+|Organisation | ✅ | ❌ | ❌ | ❌ | ❌
+|Frequency Penalty | ✅ | ❌ | ❌ | ❌ | ❌
+|Safe Prompt | ❌ | ❌ | ❌ | ✅ | ❌
+|User | ✅ | ❌ | ❌ | ❌ | ❌
+|Return Full Text | ❌ | ❌ | ✅ | ❌ | ❌
+|Wait for Model | ❌ | ❌ | ✅ | ❌ | ❌
+|Use Proxy | ✅ | ❌ | ❌ | ❌ | ❌
+|Context Window Size | ❌ | ❌ | ❌ | ❌ | ✅
+|===
+
+
+[cols="2,5", options="header"]
+|===
+|Option |Description
+
+|API/Endpoint URL
+|Specifies the identifier (e.g., URL, model ID, or image ID) used to access 
and interact with models provided by services like OpenAI, Anthropic, Mistral, 
Ollama, or Hugging Face.
+
+|Temperature
+|Controls the randomness of the output: high values lead to more diverse text 
(creative/inventive), while low values produce more focused and deterministic 
responses. The value typically ranges between 0 and 1 but can exceed this range 
depending on the model. Note: Unless intentional, it is generally best to 
adjust only one parameter; either temperature or Top-P/K, rather than multiple 
simultaneously.
+
+|Max New Tokens
+|Sets a limit on how many tokens the model may generate in one response.
+
+|Context Window Size
+|Specifies how much input tokens the model can consider at once.
+
+|Timeout
+|Determines how long to wait for a response before giving up.
+
+|API Key/AccessToken
+| Unique access value used to authenticate and authorise access to the 
endpoint service.
+
+|Model Name
+|Identifies which specific model to use.
+
+|Top P
+|Also known as nucleus sampling (Holtzman et al., 2019), controls the 
diversity of output by limiting the model's choices to a dynamic subset of the 
vocabulary.  For example, 0.2 means only the tokens comprising the top 20% 
probability mass are considered. Top-p and temperature both influence the 
randomness and diversity of the model's output but operate differently. 
Temperature scales the probabilities of all tokens while top-p limits token 
selection to a subset of options whose cumulat [...]
+
+|Max Retries
+|The maximum number of retry attempts for failed API requests.
+
+|Response Format
+|Defines the structure used for the model's responses. Note: While some APIs 
support this option, not all models are compatible with it. Additionally, the 
API may require explicitly instructing the model to produce JSON to ensure 
proper functionality.
+
+|Seed
+|Attempts to sample deterministically, ensuring consistent results for 
repeated requests with the same seed, prompt, and parameters.
+
+|Top K
+|Controls the breadth of possible answers by limiting the number of tokens 
considered. Unlike Top P, which selects tokens until their cumulative 
probability meets or exceeds __**p**__, this parameter restricts the selection 
to a fixed number of tokens defined by __**k**__. Note: Unless intentional, it 
is generally best to adjust only one parameter; either temperature or Top-P/K, 
rather than multiple simultaneously.
+
+|Repeat/Frequency Penalty
+|This setting adjusts how the model evaluates token frequency in the generated 
text, encouraging the model to use less/more repetitive language. The penalty 
can range from -2.0 to 2.0, depending on the endpoint model. Positive values 
discourage repetition by penalising frequently used tokens, while negative 
values can increase repetition by favouring them. Reasonable values for the 
frequency penalty are typically between 0.1 and 1.0.
+
+|Presence Penalty
+|This setting determines how the model evaluates new tokens, aiming to 
discourage reuse of previously generated tokens. The penalty ranges from -2.0 
to 2.0, depending on the endpoint model. Positive values penalise tokens that 
have already appeared in the text, encouraging the model to explore new 
directions. Unlike the frequency penalty, which scales proportionally based on 
how often a token is used, the presence penalty applies a one-time additive 
penalty to tokens that have appeared a [...]
+
+|Organisation
+|The organisation ID associated with the API key, for account and billing 
purposes.
+
+|Safe Prompt
+|Provides a baseline prompt to guide the model towards generating safe 
outputs.  For example, by prepending your messages with a system prompt such 
as: ``__Always assist with care, respect, and truth. Respond with utmost 
utility yet securely. Avoid harmful, unethical, prejudiced, or negative 
content. Ensure replies promote fairness and positivity__``.
+
+|User
+|A unique identifier for the end-user, used by the endpoint provider to 
monitor activity and detect potential abuse.
+
+|Return Full Text
+|Ensures the entire generated text is returned without truncation.
+
+|Wait for Model
+|Delays response until the model is ready. If the model is cold and needs 
loading, this avoids repeated requests by waiting for it to become available.
+
+|Use Proxy
+|Routes requests through a proxy server.
+
+|Log Requests
+|Used for debugging requests.
+
+|Log Responses
+|Used for debugging responses.
+|===
+
+== ADDITIONAL INFORMATION
+
+=== Token
+A token represents a unit of text, such as a word, subword, or character, 
depending on the tokenisation method used during model development. The size of 
a token varies across models, but a common estimate for English is 
approximately 4 tokens per 3 words. For more precise calculations, it is 
important to consider the specific tokenisation method employed by the model, 
such as Byte Pair Encoding (BPE) or WordPiece. There are online tools that 
provide convenient calculators to determine t [...]
+
+=== Chat Completion / Prompting
+Chat completion refers to the JSON structure used to design and control the 
flow of interactions in a conversational AI system (the model) via an API. This 
structure defines role-based communication (e.g., "system," "user," 
"assistant") and shapes the assistant's behaviour, enabling it to capture user 
intent, extend text in response to prompts, and generate relevant content even 
for tasks the model hasn’t been specifically fine-tuned for.
+
+Other terms related to this concept include prompting, prompt engineering, 
in-context learning and meta-learning. While these terms might sometimes be 
used interchangeably, they have distinct technical meanings. Meta-learning 
refers to the broader framework where a model leverages an 
inner-loop/outer-loop structure to adapt to new tasks. Within this framework, 
in-context learning represents the inner loop, where the model performs tasks 
based on examples provided during inference without [...]
+
+Example:
+[source,json]
+----
+[
+{
+  "role" : "system",
+  "content" : "Provides instructions, context, or guidelines to shape the 
assistant's behaviour and set the overall tone or purpose of the interaction."
+}, {
+  "role" : "user",
+  "content" : "Represents the individual interacting with the system, 
providing queries, requests, or instructions to guide the assistant’s 
responses."
+}, {
+  "role" : "assistant",
+  "content" : "Responds to the user’s input by generating outputs based on the 
system's instructions and the user's messages."
+}
+]
+----
+
+=== Basic Prompt Engineering Techniques
+
+- *Zero-Shot:* The model generates output based solely on the given prompt and 
its pre-existing knowledge, with no examples provided to guide its response 
(Brown et al., 2020).
+
+- *One/Few-Shot:* A small set of examples (one or more) are provided within 
the prompt to help the model generalise to new, unseen inputs (Brown et al., 
2020).
+
+- *Chain-of-Thought:*  The model is guided through a logical, step-by-step 
reasoning process, enabling it to break down complex problems and improve its 
ability to reach more accurate conclusions (Wei et al., 2022). This method is 
particularly effective for multistep reasoning tasks. Variants include 
Zero-Shot and Few-Shot Chain-of-Thought (Kojima et al., 2023).
+
+- *Self-Consistency:* The model generates multiple diverse reasoning paths for 
the same prompt by introducing variability through techniques like adjusting 
the temperature or applying nucleus sampling (top-p). The final answer is 
determined by selecting the response that occurs most frequently among these 
outputs (Wang et al., 2023). This method is particularly effective for 
improving accuracy in reasoning tasks by leveraging diversity in outputs. For 
example, in a classification task, t [...]
+
+- *Prompt-Chaining (Sequential Chaining):* A method where multiple related 
prompts are provided in sequence, with each prompt building on the output of 
the previous one. This step-by-step approach helps guide the model’s reasoning 
process, enabling it to tackle complex tasks incrementally. This technique 
requires the JSON chat feature to be enabled, as it relies on this 
functionality to chain the conversation.
\ No newline at end of file

(hop) branch release/2.11.0 updated: [DOCS] cherry pick language model chat documentation

Reply via email to