milenkovicm commented on PR #1388:
URL:
https://github.com/apache/datafusion-ballista/pull/1388#issuecomment-3980564362
I spent some time with this, the reason being with latest changes TPCH
(SF10) starts braking on my machine, as the process is getting out of ephemeral
client ports (around 16K on macos) and macos has 1 min `TIME_WAIT` before port
can be re-assigned.
so the ininital approach was to cache channel per address
```rust
static CHANNEL_CACHE: LazyLock<DashMap<String, Channel>> =
LazyLock::new(DashMap::new);
```
clone it and re-use it. Aparently this does not work well if there is a lot
of data to be transferred from server to client (everything get stuck)
`initial_connection_window_size` & `initial_stream_window_size` help in some
cases, not in all.
Looks like single tcp connection can't be pushed to transfer as much data as
shuffle files generate.
Alternative approach is to keep multiple connections for same address and
then load balance across them
```rust
static CHANNEL_CACHE: LazyLock<DashMap<String, Vec<Channel>>> =
LazyLock::new(DashMap::new);
```
apparently connection re-use is not as trivial as i initially thought
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]