This is an automated email from the ASF dual-hosted git repository. shengkai pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push: new ceecdb77939 [FLINK-37799][model][docs] Add document for OpenAI Model Function (#26671) ceecdb77939 is described below commit ceecdb779399bbb08b37cd96a3633182dcbae5cd Author: yunfengzhou-hub <yuri.zhouyunf...@outlook.com> AuthorDate: Sat Jun 21 15:24:43 2025 +0800 [FLINK-37799][model][docs] Add document for OpenAI Model Function (#26671) --- docs/content.zh/docs/connectors/models/_index.md | 21 ++ docs/content.zh/docs/connectors/models/openai.md | 244 +++++++++++++++++++++++ docs/content/docs/connectors/models/_index.md | 21 ++ docs/content/docs/connectors/models/openai.md | 244 +++++++++++++++++++++++ 4 files changed, 530 insertions(+) diff --git a/docs/content.zh/docs/connectors/models/_index.md b/docs/content.zh/docs/connectors/models/_index.md new file mode 100644 index 00000000000..ec339e44725 --- /dev/null +++ b/docs/content.zh/docs/connectors/models/_index.md @@ -0,0 +1,21 @@ +--- +title: Models +bookCollapseSection: true +weight: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> diff --git a/docs/content.zh/docs/connectors/models/openai.md b/docs/content.zh/docs/connectors/models/openai.md new file mode 100644 index 00000000000..c9b9e387b57 --- /dev/null +++ b/docs/content.zh/docs/connectors/models/openai.md @@ -0,0 +1,244 @@ +--- +title: "OpenAI" +weight: 1 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# OpenAI + +OpenAI模型函数允许Flink SQL调用[OpenAI API](https://platform.openai.com/docs/overview)执行推理任务。 + +## 概述 + +该函数支持通过Flink SQL调用远程的OpenAI模型服务进行预测/推理任务。目前支持以下任务类型: + +* [Chat Completions](https://platform.openai.com/docs/api-reference/chat):根据包含对话消息列表生成模型响应。 +* [Embeddings](https://platform.openai.com/docs/api-reference/embeddings):获取给定输入的向量表示,方便在后续流程中由机器学习模型和算法消费。 + +## 使用示例 + +以下示例创建了一个聊天补全模型,并使用它对电影评论进行情感标签预测。 + +首先,使用如下SQL语句创建聊天补全模型: + +```sql +CREATE MODEL ai_analyze_sentiment +INPUT (`input` STRING) +OUTPUT (`content` STRING) +WITH ( + 'provider'='openai', + 'endpoint'='https://api.openai.com/v1/chat/completions', + 'api-key' = '<YOUR KEY>', + 'model'='gpt-3.5-turbo', + 'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.' +); +``` + +假设如下数据存储在名为 `movie_comment` 的表中,预测结果需要存储到名为 `print_sink` 的表中: + +```sql +CREATE TEMPORARY VIEW movie_comment(id, movie_name, user_comment, actual_label) +AS VALUES + (1, '好东西', '最爱小孩子猜声音那段,算得上看过的电影里相当浪漫的叙事了。很温和也很有爱。', 'positive'); + +CREATE TEMPORARY TABLE print_sink( + id BIGINT, + movie_name VARCHAR, + predicit_label VARCHAR, + actual_label VARCHAR +) WITH ( + 'connector' = 'print' +); +``` + +然后就可以使用如下SQL语句对电影评论进行情感标签预测。 + +```sql +INSERT INTO print_sink +SELECT id, movie_name, content as predicit_label, actual_label +FROM ML_PREDICT( + TABLE movie_comment, + MODEL ai_analyze_sentiment, + DESCRIPTOR(user_comment)); +``` + +## 模型选项 + +### 公共选项 + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">参数</th> + <th class="text-center" style="width: 10%">是否必选</th> + <th class="text-center" style="width: 10%">默认值</th> + <th class="text-center" style="width: 10%">数据类型</th> + <th class="text-center" style="width: 45%">描述</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>provider</h5> + </td> + <td>必填</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>指定使用的模型提供方,必须为 'openai'。</td> + </tr> + <tr> + <td> + <h5>endpoint</h5> + </td> + <td>必填</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>OpenAI API端点的完整URL,例如:<code>https://api.openai.com/v1/chat/completions</code> 或 + <code>https://api.openai.com/v1/embeddings</code>。</td> + </tr> + <tr> + <td> + <h5>api-key</h5> + </td> + <td>必填</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>用于认证的OpenAI API密钥。</td> + </tr> + <tr> + <td> + <h5>model</h5> + </td> + <td>必填</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>模型名称,例如:<code>gpt-3.5-turbo</code>, <code>text-embedding-ada-002</code>。</td> + </tr> + </tbody> +</table> + +### Chat Completions + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">参数</th> + <th class="text-center" style="width: 10%">是否必选</th> + <th class="text-center" style="width: 10%">默认值</th> + <th class="text-center" style="width: 10%">数据类型</th> + <th class="text-center" style="width: 45%">描述</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>system-prompt</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">"You are a helpful assistant."</td> + <td>String</td> + <td>用于聊天任务的系统提示信息。</td> + </tr> + <tr> + <td> + <h5>temperature</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">null</td> + <td>Double</td> + <td>控制输出的随机性,取值范围<code>[0.0, 1.0]</code>。参考<a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature">temperature</a></td> + </tr> + <tr> + <td> + <h5>top-p</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">null</td> + <td>Double</td> + <td>用于替代temperature的概率阈值。参考<a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p">top_p</a></td> + </tr> + <tr> + <td> + <h5>stop</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">null</td> + <td>String</td> + <td>停止序列,逗号分隔的列表。参考<a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-stop">stop</a></td> + </tr> + <tr> + <td> + <h5>max-tokens</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">null</td> + <td>Long</td> + <td>生成的最大token数。参考<a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens">max tokens</a></td> + </tr> + </tbody> +</table> + +### Embeddings + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">参数</th> + <th class="text-center" style="width: 10%">是否必选</th> + <th class="text-center" style="width: 10%">默认值</th> + <th class="text-center" style="width: 10%">数据类型</th> + <th class="text-center" style="width: 45%">描述</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>dimension</h5> + </td> + <td>可选</td> + <td style="word-wrap: break-word;">null</td> + <td>Long</td> + <td>embedding向量的维度。参考<a href="https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions">dimensions</a></td> + </tr> + </tbody> +</table> + +## Schema要求 + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-center">任务类型</th> + <th class="text-left">输入类型</th> + <th class="text-center">输出类型</th> + </tr> + </thead> + <tbody> + <tr> + <td>Chat Completions</td> + <td>STRING</td> + <td>STRING</td> + </tr> + <tr> + <td>Embeddings</td> + <td>STRING</td> + <td>ARRAY<FLOAT></td> + </tr> + </tbody> +</table> diff --git a/docs/content/docs/connectors/models/_index.md b/docs/content/docs/connectors/models/_index.md new file mode 100644 index 00000000000..ec339e44725 --- /dev/null +++ b/docs/content/docs/connectors/models/_index.md @@ -0,0 +1,21 @@ +--- +title: Models +bookCollapseSection: true +weight: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> diff --git a/docs/content/docs/connectors/models/openai.md b/docs/content/docs/connectors/models/openai.md new file mode 100644 index 00000000000..316f07dd378 --- /dev/null +++ b/docs/content/docs/connectors/models/openai.md @@ -0,0 +1,244 @@ +--- +title: "OpenAI" +weight: 1 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# OpenAI + +The OpenAI Model Function allows Flink SQL to call [OpenAI API](https://platform.openai.com/docs/overview) for inference tasks. + +## Overview + +The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported: + +* [Chat Completions](https://platform.openai.com/docs/api-reference/chat): generate a model response from a list of messages comprising a conversation. +* [Embeddings](https://platform.openai.com/docs/api-reference/embeddings): get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. + +## Usage examples + +The following example creates a chat completions model and uses it to predict sentiment labels for movie reviews. + +First, create the chat completions model with the following SQL statement: + +```sql +CREATE MODEL ai_analyze_sentiment +INPUT (`input` STRING) +OUTPUT (`content` STRING) +WITH ( + 'provider'='openai', + 'endpoint'='https://api.openai.com/v1/chat/completions', + 'api-key' = '<YOUR KEY>', + 'model'='gpt-3.5-turbo', + 'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.' +); +``` + +Suppose the following data is stored in a table named `movie_comment`, and the prediction result is to be stored in a table named `print_sink`: + +```sql +CREATE TEMPORARY VIEW movie_comment(id, movie_name, user_comment, actual_label) +AS VALUES + (1, 'Good Stuff', 'The part where children guess the sounds is my favorite. It's a very romantic narrative compared to other movies I've seen. Very gentle and full of love.', 'positive'); + +CREATE TEMPORARY TABLE print_sink( + id BIGINT, + movie_name VARCHAR, + predicit_label VARCHAR, + actual_label VARCHAR +) WITH ( + 'connector' = 'print' +); +``` + +Then the following SQL statement can be used to predict sentiment labels for movie reviews: + +```sql +INSERT INTO print_sink +SELECT id, movie_name, content as predicit_label, actual_label +FROM ML_PREDICT( + TABLE movie_comment, + MODEL ai_analyze_sentiment, + DESCRIPTOR(user_comment)); +``` + +## Model Options + +### Common + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>provider</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Specifies the model function provider to use, must be 'openai'.</td> + </tr> + <tr> + <td> + <h5>endpoint</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Full URL of the OpenAI API endpoint, e.g. <code>https://api.openai.com/v1/chat/completions</code> or + <code>https://api.openai.com/v1/embeddings</code>.</td> + </tr> + <tr> + <td> + <h5>api-key</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>OpenAI API key for authentication.</td> + </tr> + <tr> + <td> + <h5>model</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Model name, e.g. <code>gpt-3.5-turbo</code>, <code>text-embedding-ada-002</code>.</td> + </tr> + </tbody> +</table> + +### Chat Completions + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>system-prompt</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">"You are a helpful assistant."</td> + <td>String</td> + <td>The input message for the system role.</td> + </tr> + <tr> + <td> + <h5>temperature</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">null</td> + <td>Double</td> + <td>Controls randomness of output, range <code>[0.0, 1.0]</code>. See <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature">temperature</a></td> + </tr> + <tr> + <td> + <h5>top-p</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">null</td> + <td>Double</td> + <td>Probability cutoff for token selection (used instead of temperature). See <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p">top_p</a></td> + </tr> + <tr> + <td> + <h5>stop</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">null</td> + <td>String</td> + <td>Stop sequences, comma-separated list. See <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-stop">stop</a></td> + </tr> + <tr> + <td> + <h5>max-tokens</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">null</td> + <td>Long</td> + <td>Maximum number of tokens to generate. See <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens">max tokens</a></td> + </tr> + </tbody> +</table> + +### Embeddings + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>dimension</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">null</td> + <td>Long</td> + <td>Dimension of the embedding vector. See <a href="https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions">dimensions</a></td> + </tr> + </tbody> +</table> + +## Schema Requirement + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-center">Task</th> + <th class="text-left">Input Type</th> + <th class="text-center">Output Type</th> + </tr> + </thead> + <tbody> + <tr> + <td>Chat Completions</td> + <td>STRING</td> + <td>STRING</td> + </tr> + <tr> + <td>Embeddings</td> + <td>STRING</td> + <td>ARRAY<FLOAT></td> + </tr> + </tbody> +</table>