morningman opened a new pull request, #63817: URL: https://github.com/apache/doris/pull/63817
## Summary Adds SSRF (Server-Side Request Forgery) hardening for `CREATE CATALOG` so that user-supplied catalog endpoints / URIs that point at internal, private, or loopback hosts are rejected at catalog creation time, before the FE makes any outbound connection to them. Follows the pattern from #58585 (which added the SSRF check to `CreateStageCommand`). ## Coverage The check runs unconditionally on every `CREATE CATALOG` (regardless of `test_connection`) and covers: - **HMS thrift URI** — `hive.metastore.uris` (Hive HMS, Iceberg HMS, Paimon HMS); comma-separated values are validated independently - **HDFS** — `fs.defaultFS` and HA `dfs.namenode.rpc-address.<nameservice>.<namenode>` entries - **S3-compatible object storage** — endpoint on `S3Properties`, `MinioProperties`, `OSSProperties`, `COSProperties`, `OBSProperties`, `GCSProperties`, `OzoneProperties`, `AzureProperties`, `OSSHdfsProperties` - **Iceberg REST** — `iceberg.rest.uri` - **Paimon REST** — `paimon.rest.uri` - **AWS Glue** — `glue.endpoint` - **Aliyun DLF** — `dlf.endpoint` S3 load and S3 TVF are unchanged — they already wrap their endpoint connectivity test in the same SSRF hook via `S3Util.validateAndTestEndpoint()`. ## Design Discovery is **annotation-driven** to keep the checker extensible without instanceof chains: - Added a `checkSsrf` flag to `@ConnectorProperty` and marked every endpoint / URI field with `checkSsrf = true`. - `CatalogSsrfChecker` walks the metastore + storage property object graph by reflection, picking up every annotated field automatically. Recursion is bounded to classes under `org.apache.doris.datasource.property.*` and de-duped with an identity set. - HDFS HA namenode rpc-addresses live behind dynamic keys, not declared fields, so they are picked up separately by scanning `HdfsProperties.getBackendConfigProperties()` for the known prefix. - Auto-fallback `HdfsProperties` instances (where `explicitlyConfigured == false`) are skipped wholesale, so SSRF errors do not fire on catalogs whose user never configured HDFS. **Adding a new property class** only requires `checkSsrf = true` on its URI field — no `CatalogSsrfChecker` change needed. ## Failure mode When the underlying `SecurityChecker` rejects a URI, `CatalogSsrfChecker` propagates a `DdlException` that names the offending host. Example: ``` SSRF check failed for catalog 'my_hive', uri 'thrift://127.0.0.1:9083': <reason from rule engine> ``` Each URI is wrapped in its own try / finally so `SecurityChecker.stopSSRFChecking()` always runs. ## Test plan - [x] Added `CatalogSsrfCheckerTest` with 10 cases — null inputs, HMS thrift, comma-separated HMS, Iceberg REST URI normalization, HDFS `fs.defaultFS`, HDFS HA rpc-addresses, implicit-HDFS skip, `SecurityChecker` exception propagation, non-URI fields not validated. Uses `Mockito.mockStatic(SecurityChecker.class)` to capture the calls. - [x] Re-ran existing catalog test suite (`ExternalCatalogTest`, `IncludeTableListTest`, `RefreshCatalogTest`, `HMSPropertiesTest`, `IcebergRestPropertiesTest`, all S3-compatible `*PropertiesTest`, `HdfsProperties*Test`, `PaimonRestMetaStorePropertiesTest`, `AWSGlueMetaStoreBasePropertiesTest`, `AliyunDLFBasePropertiesTest`) — all green. - [ ] CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
