Re: [I] [Bug] Spark is unable to write to Doris hosted on K8s ( Compute Storage Decoupled mode ) [doris-spark-connector]

via GitHub Fri, 28 Nov 2025 22:39:16 -0800


Nj-kol commented on issue #325:
URL: 
https://github.com/apache/doris-spark-connector/issues/325#issuecomment-3591048191


   > [@Nj-kol](https://github.com/Nj-kol) There are two issues here: reading 
and writing.
   > 
   > 1. For writing, configuring the external BE IP address on `doris.benodes` 
will solve the problem; this parameter is currently supported.
   > 2. For reading, as shown in the error message when accessing BE:9060, you 
need to switch to ArrowFlight for reading. The default Thrift method connects 
directly to BE:9060, but ArrowFlight provides an externally accessible method, 
requiring the configuration of `public_host={nginx ip}` and 
`arrow_flight_sql_proxy_port={nginx port}`.
   
   @JNSimba I tried out your suggestions, and here are my findings - 
   
   For reading, after switching to ArrowFlight and changing the config 
`public_host={load balancer ip}`, it worked! - So thanks for that : )
   
   But the issue with writing stills persists, it seems that the parameter 
`doris.benodes` has no effect.  Here is what I tried - 
   
   Create a table in doris:
   
   ```sql
   CREATE TABLE IF NOT EXISTS demos.user_data (
       user_id INT,
       name STRING,
       age INT,
       update_time STRING
   )
   UNIQUE KEY(user_id)
   DISTRIBUTED BY HASH(user_id) BUCKETS 3
   PROPERTIES (
       "replication_num" = "1",
       "enable_unique_key_merge_on_write" = "true"
   );
   ```
   
   Launch shell - 
   
   ```shell
   spark-shell \
   --packages org.apache.doris:spark-doris-connector-spark-3.5:25.2.0 \
   --jars /$HOME/Softwares/jars/mysql-connector-j-9.3.0.jar
   ```
   
   Code - 
   
   ```scala
   import org.apache.spark.sql.SparkSession
   import spark.implicits._
   import org.apache.spark.sql.SaveMode
   
   val feNodes = "<load_balancer_ip>:8030"
   val dorisUser = "root"
   val dorisPass = ""
   val beNodes = "<load_balancer_ip>:8040"
   
   // Sample batch data
   val batchData = Seq(
     (1, "Alice", 30, "2024-01-01 10:00:00"),
     (2, "Bob", 25, "2024-01-01 11:00:00")
   )
   
   val df = batchData.toDF("user_id", "name", "age", "update_time")
   
   df.write.format("doris")
     .option("doris.table.identifier", "demos.user_data")
     .option("doris.fenodes", feNodes)
     .option("doris.benodes", beNodes)
     .option("user", dorisUser)
     .option("password",dorisPass)
     .option("doris.query.port", "9030")
     .option("doris.write.fields", "user_id,name,age,update_time")
     .option("doris.read.mode", "arrow")
     .option("doris.read.arrow-flight-sql.port", "8070")
     .mode(SaveMode.Overwrite)
     .save()
   ```
   
   Error -
   
   ```shell
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0) (192.168.0.142 executor driver): java.net.UnknownHostException: 
doris-toolkit-dev-cg1-0.doris-toolkit-dev-cg1.doris.svc.cluster.local
   ```
   
   As you can see, the BE IP being used is still being fetched from FE as 
`doris-toolkit-dev-cg1-0.doris-toolkit-dev-cg1.doris.svc.cluster.local1` and 
not being taken from `doris.benodes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Bug] Spark is unable to write to Doris hosted on K8s ( Compute Storage Decoupled mode ) [doris-spark-connector]

Reply via email to