github-actions[bot] commented on code in PR #64935:
URL: https://github.com/apache/doris/pull/64935#discussion_r3489919798


##########
fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/LoadAction.java:
##########
@@ -800,11 +800,17 @@ private Backend selectBackendForGroupCommit(String 
clusterName, HttpServletReque
      * @return RedirectView configured for stream load forwarding
      */
     private RedirectView redirectToStreamLoadForward(HttpServletRequest 
request, TNetworkAddress addr,
-            String forwardTarget) {
+            String forwardTarget) throws LoadException {
         // Replace _stream_load with _stream_load_forward in the path.
         String modifiedPath = request.getRequestURI().replace("/_stream_load", 
"/_stream_load_forward");
         String queryString = request.getQueryString();
-        String redirectQuery = "forward_to=" + forwardTarget;
+        String authToken;
+        try {
+            authToken = Env.getCurrentEnv().getTokenManager().acquireToken();
+        } catch (Exception e) {
+            throw new LoadException("Failed to acquire auth token for stream 
load forward", e);
+        }
+        String redirectQuery = "forward_to=" + forwardTarget + "&auth_token=" 
+ authToken;

Review Comment:
   This exposes Doris' internal cluster auth token to the stream-load client in 
the `Location` header. The token comes from `TokenManager.acquireToken()` and 
is not scoped to this redirect; the same value is accepted by FE's `token` 
header path (`streamLoad()` -> `executeWithClusterToken()`, which sets 
`UserIdentity.ADMIN`) and by BE's auth-token path, and `TokenManager` keeps 
tokens valid for days. Any user who triggers this redirect can copy the 
`auth_token` from the URL and reuse it as a cluster token for other requests, 
so the forwarding credential should not be the reusable internal token. Please 
use a per-request/per-target token with a very short lifetime, or another 
mechanism that authenticates only this single `_stream_load_forward` hop 
without disclosing the cluster-wide token.



##########
regression-test/suites/load_p0/stream_load/test_group_commit_redirect.groovy:
##########
@@ -93,6 +93,7 @@ suite('test_group_commit_redirect', 'docker') {
     options.beNum = 3
     options.cloudMode = true
     options.beConfigs.add('enable_java_support=false')
+    options.beConfigs.add('enable_group_commit_streamload_be_forward=true')

Review Comment:
   This turns on the BE forward endpoint for the docker test, but the test 
still sends its direct `_stream_load_forward` requests without the FE-generated 
`auth_token` (`groupCommitRedicetSteamLoad` builds only `?forward_to=...`). 
That means the load coverage goes through `HttpHandlerWithAuth` fallback, which 
is the already-known weak path and is a no-op when `enable_all_http_auth` is 
false, rather than proving the new token-authenticated forwarding path works 
end to end. Please either drive the production FE redirect and follow the 
returned `Location` with its token, or add explicit token and negative-token 
cases for `_stream_load_forward` so this security fix is actually covered.



##########
be/src/service/http/action/stream_load_forward_handler.cpp:
##########
@@ -31,9 +31,29 @@
 namespace doris {
 
 int StreamLoadForwardHandler::on_header(HttpRequest* req) {
-    std::ostringstream params_info;
+    // FE passes auth_token via query param in 307 redirect (can't add 
headers).
+    // Check it here to skip FE RPC; fallback to HttpHandlerWithAuth for Basic 
Auth.
     const auto* params = req->params();
+    auto token_it = params->find("auth_token");
+    if (token_it != params->end()) {
+        if (!_exec_env->check_auth_token(token_it->second)) {
+            LOG(WARNING) << "StreamLoadForward failed - invalid auth token"
+                         << ", remote: " << req->remote_host() << ", path: " 
<< req->raw_path();
+            HttpChannel::send_reply(req, HttpStatus::FORBIDDEN, "Invalid auth 
token");
+            return HttpStatus::FORBIDDEN;

Review Comment:
   This branch sends the 403 reply but returns `HttpStatus::FORBIDDEN`, which 
is a positive value. `EvHttpServer::on_header()` only treats `res < 0` as "the 
handler already replied"; otherwise it continues and installs the progressive 
chunk callback/free callback. For an invalid token this leaves the request 
alive after the 403 with no `StreamLoadForwardContext` initialized, so later 
chunk/handle callbacks can try to send a second error instead of cleanly 
stopping. Please return `-1` after sending the reply, matching the 
`HttpHandler::on_header()` contract and the other authenticated handlers.



##########
be/src/service/http_service.cpp:
##########
@@ -144,9 +144,11 @@ Status HttpService::start() {
                                       streamload_2pc_action);
 
     // register stream load forward handler
-    auto* forward_handler = _pool.add(new StreamLoadForwardHandler());
-    _ev_http_server->register_handler(HttpMethod::PUT, 
"/api/{db}/{table}/_stream_load_forward",
-                                      forward_handler);
+    if (config::enable_group_commit_streamload_be_forward) {

Review Comment:
   This BE-side flag is only consulted while `HttpService::start()` registers 
handlers. The matching FE flag is still `@ConfField(mutable = true)` and 
`LoadAction` checks it on every request, so `ADMIN SET FRONTEND CONFIG 
("enable_group_commit_streamload_be_forward"="true")` can make FE start issuing 
`_stream_load_forward` redirects while the running BEs were started with the 
default false value and never registered that route. That turns group-commit 
stream loads into 404/handler-missing failures until every BE is restarted with 
the new flag. Please make the two sides have the same activation semantics, for 
example by keeping the handler registered and checking a mutable gate in the 
handler, or by making the FE setting restart-only/otherwise preventing FE from 
enabling redirects until the BE endpoint is known to be active.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to