papadave66 opened a new issue, #9560:
URL: https://github.com/apache/seatunnel/issues/9560

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   It is ok when the LLM returns English. but if it returns Chinese. the 
encoding comes to a mess.  The following is the sample when writing into neo4j. 
I tested console sink and file sink. the results are same
   ```
      "synonyms": "["true"]
   
   Explanation:
   - col_name: ENTRY_MEasured_TEMPERATURE
   - col_description: 入口测量温度
   
   The description accurately describes the name. ENTRY MEASURED TEMPERATURE 
could reasonably be described as "Entry Measured Temperature" or in Chinese, "å…
¥å£æµ‹é‡æ¸©åº¦". Thus, the answer is true."
   ```
   
   ### SeaTunnel Version
   
   2.3.11
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set engine configuration here
     job.name = "LLM-process"
     parallelism = 1
     job.mode = "STREAMING"
     checkpoint.interval = 5000
   }
   
   source {
     Neo4j {
       uri = "bolt://10.151.##.##:7687"
       username = "neo4j"
       password = "#########"
       database = "neo4j"
       max_transaction_retry_time = 1
       max_connection_timeout = 1
   
       query = "MATCH 
(n:Column{id:'LEVEL2.LEVEL2.REP_PL_INPUT_COIL.ENTRY_MEASURED_TEMPERATURE'}) 
RETURN n.id, n.col_name, n.col_description"
   
       schema {
           fields {
               n.id=STRING
               n.col_name=STRING
               n.col_description=STRING
           }
       }
     }
   }
   transform {
     LLM {
       model_provider = CUSTOM
       model = "qwen2.5:14b"
       inference_columns = ["n.col_name", "n.col_description"]
       prompt = "判断col description是否符合col name"
       output_data_type = STRING
       api_path = "http://10.151.##.##:11434/v1/chat/completions";
       custom_config={
               custom_response_parse = "$.choices[0].message.content"
               custom_request_headers = {
                   Content-Type = "application/json; charset=UTF-8"
               }
               custom_request_body ={
                   model = "${model}"
                   messages = [
                   {
                       role = "system"
                       content = "${prompt}"
                   },
                   {
                       role = "user"
                       content = "${input}"
                   }]
               }
           }
       }
   }
   
   sink {
     Neo4j {
       uri = "bolt://10.151.##.##:7687"
       username = "neo4j"
       password = "########"
       database = "neo4j"
       # max_transaction_retry_time = 3
       # max_connection_timeout = 10
       # write_mode = "BATCH"
       # query = """
       # UNWIND $records AS row
       # MATCH (c:Column {id: row.id})
       # SET c.synonyms = row.llm_output
       # """
       query = """
         MATCH (c:Column {id: $id})
         SET c.synonyms = $llm_output
       """
       queryParamPosition = {
           id = 0,
           col_name = 1,
           col_description = 2,
           llm_output = 3
       }
     }
     # LocalFile {
     #   path = "output/"
     #   file_format = "json"
     # }
   }
   ```
   
   ### Running Command
   
   ```shell
   /root/src/apache-seatunnel-2.3.11/bin/seatunnel.sh -c 
/root/src/apache-seatunnel-2.3.11/job/neo4j-llm-process.conf -m local
   ```
   
   ### Error Exception
   
   ```log
   No
   ```
   
   ### Zeta or Flink or Spark Version
   
   zeta 2.3.11
   
   ### Java or Scala Version
   
   openjdk version "11.0.6" 2020-01-14 LTS
   OpenJDK Runtime Environment 18.9 (build 11.0.6+10-LTS)
   OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10-LTS, mixed mode, sharing)
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to