r4ntix opened a new issue, #214:
URL: https://github.com/apache/arrow-ballista/issues/214

   **Describe the bug**
   I start the scheduler and executor service in localhost:
   ```shell
   ./target/debug/ballista-scheduler -s push-staged --log-level-setting 
INFO,ballista_scheduler=DEBUG
   
   ./target/debug/ballista-executor -c 4 -s push-staged --log-level-setting 
INFO,ballista_executor=DEBUG
   ```
   
   And run 
[examples/src/bin/sql.rs](https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/examples/src/bin/sql.rs),
 the executor was an error:
   ```shell
   2022-09-14T07:18:35.502374Z ERROR tokio-runtime-worker ThreadId(08) 
ballista_executor::executor_server: Fail to connect to scheduler 
scheduler_ballista_localhost_50050 due to 
TonicError(tonic::transport::Error(Transport, hyper::Error(Connect, 
ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to 
lookup address information: nodename nor servname provided, or not known" }))))
   ```
   The executor can't connect to scheduler via 
`scheduler_ballista_localhost_50050`.
   
   **To Reproduce**
   
   The scheduler send `scheduler_id` in `LaunchTaskParams` to executor when 
launch task:
   
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/scheduler/src/state/task_manager.rs#L415-L430
   
   The scheduler_id generate by scheduler when start service, and value is 
`format!("scheduler_{}_{}_{}", namespace, external_host, port)`:
   
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/scheduler/src/main.rs#L171
   
   In the process of the executor reporting the task status, call 
`get_scheduler_client` pass `scheduler_id`:
   
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/executor/src/executor_server.rs#L507-L519
   
   The `scheduler_id` value is `format!("scheduler_{}_{}_{}", namespace, 
external_host, port)`, that can't lookup address via dns:
   
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/executor/src/executor_server.rs#L222-L237
   
   So the executor throw an error, and task fail.
   
   **Describe the solution you'd like**
   Fix `scheduler_name` to format!("{}:{}", opt.external_host, opt.bind_port), 
default is `localhost:50050`. 
   The prefix name of the log file remains `format!("scheduler_{}_{}_{}", 
namespace, external_host, port)`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to