zclllyybb commented on issue #64514:
URL: https://github.com/apache/doris/issues/64514#issuecomment-4705588667

   Breakwater-GitHub-Analysis-Slot: slot_11db6348ed7e
   
   Initial triage: this looks like a real branch-4.1 regression.
   
   What I checked:
   - The issue currently has no labels or comments and reports version 
4.1.2-rc01.
   - In 4.1.2-rc01, `be/src/service/http/action/stream_load.cpp` logs `HTTP 
headers=` at INFO through `req->get_all_headers()`, but 
`be/src/service/http/http_request.cpp` returns each raw header value from 
`get_all_headers()`.
   - In the same tag, 
`be/src/service/http/action/stream_load_forward_handler.cpp` separately 
iterates `req->headers()` into `headers_info` and logs that raw string at INFO.
   - Current `branch-4.1` still has the same unmasked code.
   - `master` and 4.0.6-rc02 have the masking behavior: 
`HttpRequest::debug_string()` and `get_all_headers()` mask sensitive headers, 
and the forward handler logs through `req->get_all_headers()`.
   
   Conclusion:
   The report is reproducible from code inspection. This is not just a 
logging-level/configuration issue: branch-4.1's `service/http` implementation 
does not contain the header-masking changes present on `master` and 4.0, and 
the forward handler also bypasses the header rendering helper. Sensitive 
headers such as `Authorization`, `Proxy-Authorization`, `Auth-Token`, and 
`token` can be emitted in BE INFO logs for Stream Load paths.
   
   Suggested next steps:
   1. Port the `master`/4.0 masking helper into 
`be/src/service/http/http_request.cpp` on `branch-4.1`, including both 
`debug_string()` and `get_all_headers()`.
   2. Change `StreamLoadForwardHandler::on_header()` on `branch-4.1` to use 
`req->get_all_headers()` instead of directly formatting `req->headers()`.
   3. Add a regression/unit test for header rendering so `Authorization`, 
`Proxy-Authorization`, `Auth-Token`, and `token` are masked in log/debug 
strings.
   4. Audit branch-4.1 callers of `req->debug_string()` and 
`req->get_all_headers()` because several HTTP actions log those strings and 
currently inherit the same raw-header behavior.
   
   Missing information that would help release verification, without exposing 
secrets:
   - A sanitized example log line from 4.1.2-rc01 with only header names and 
masked values.
   - Whether the observed leak is from normal Stream Load, Stream Load forward, 
or both.
   - The exact build commit if it differs from the public 4.1.2-rc01 tag.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to