davidradl commented on code in PR #26671: URL: https://github.com/apache/flink/pull/26671#discussion_r2154500012
########## docs/content/docs/connectors/models/openai.md: ########## @@ -0,0 +1,244 @@ +--- +title: "OpenAI" +weight: 1 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# OpenAI + +The OpenAI Model Function allows Flink SQL to call [OpenAI API](https://platform.openai.com/docs/overview) for inference tasks. + +## Overview + +The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported: + +* chat completions +* embeddings + +## Usage examples + +The following example creates a chat completions model and use it to predict sentiment labels for movie reviews. + +First, create the chat completions model with the following SQL statement: + +```sql +CREATE MODEL ai_analyze_sentiment +INPUT (`input` STRING) +OUTPUT (`content` STRING) +WITH ( + 'provider'='openai', + 'endpoint'='https://api.openai.com/v1/chat/completions', + 'api-key' = '<YOUR KEY>', + 'model'='gpt-3.5-turbo', + 'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.' +); +``` + +Suppose the following data is stored in a table named `movie_comment`, and the prediction result is to be stored in a table named `print_sink`: + +```sql +CREATE TEMPORARY VIEW movie_comment(id, movie_name, user_comment, actual_label) +AS VALUES + (1, 'Good Stuff', 'The part where children guess the sounds is my favorite. It's a very romantic narrative compared to other movies I've seen. Very gentle and full of love.', 'positive'); + +CREATE TEMPORARY TABLE print_sink( + id BIGINT, + movie_name VARCHAR, + predicit_label VARCHAR, + actual_label VARCHAR +) WITH ( + 'connector' = 'print' +); +``` + +Then the following SQL statement can be used to predict sentiment labels for movie reviews: + +```sql +INSERT INTO print_sink +SELECT id, movie_name, content as predicit_label, actual_label +FROM ML_PREDICT( + TABLE movie_comment, + MODEL ai_analyze_sentiment, + DESCRIPTOR(user_comment)); +``` + +## Model Options + +### common + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>provider</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Specifies the model function provider to use, must be 'openai'.</td> + </tr> + <tr> + <td> + <h5>endpoint</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Full URL of the OpenAI API endpoint, e.g., <code>https://api.openai.com/v1/chat/completions</code> or + <code>https://api.openai.com/v1/embeddings</code>.</td> + </tr> + <tr> + <td> + <h5>api-key</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>OpenAI API key for authentication.</td> + </tr> + <tr> + <td> + <h5>model</h5> + </td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Model name, e.g., <code>gpt-3.5-turbo</code>, <code>text-embedding-ada-002</code>.</td> + </tr> + </tbody> +</table> + +### chat/completions + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td> + <h5>system-prompt</h5> + </td> + <td>optional</td> + <td style="word-wrap: break-word;">"You are a helpful assistant."</td> + <td>String</td> + <td>System message for chat tasks.</td> Review Comment: It would be useful to point into the OpenAI docs as to what parameter we mean. I notice we use the phrase system-prompt an System message, I suggest we define one and point to it an only use one way to refer to this. When we say system here - do we mean system role? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org