[I] Massive TIME_WAIT socket exhaustion during metadata (manifest/avro) reads with S3FileIO + Apache HTTP client [iceberg]

via GitHub Fri, 02 Jan 2026 02:30:54 -0800


Sbaia opened a new issue, #14951:
URL: https://github.com/apache/iceberg/issues/14951


   ### Apache Iceberg version
   
   1.10.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We are seeing severe **outbound socket exhaustion (TIME_WAIT)** when running 
Iceberg maintenance operations (specifically `CALL system.rewrite_data_files`) 
on large tables stored on S3.
   
   This happens **even with Apache HTTP client + connection pooling enabled** 
and after removing any Hadoop/S3A usage.
   The issue seems to correlate strongly with **metadata / manifest (.avro) 
downloads**, not with large data file reads.
   
   ---
   
   ### **Environment**
   
   * Iceberg version: **1.10.0**
   * Spark version: **4.0.1**
   * Spark on Kubernetes (Spark Operator / SparkApplication CRD)
   * Storage: **Amazon S3**
   * FileIO: `org.apache.iceberg.aws.s3.S3FileIO`
   * HTTP client: **Apache HTTP client (Iceberg shaded)**
   * AWS SDK: via `iceberg-aws-bundle`
   * No `s3a://`, no `hadoop-aws` in use
   * REST Catalog (Lakekeeper), but REST traffic is minimal; sockets are 
clearly to S3
   
   ---
   
   ### **Observed behavior**
   
   During `rewrite_data_files` on a table with ~26k data files:
   
   * Outbound connections to S3 explode to **40k–45k sockets in TIME_WAIT**
   * Remote IPs are public S3 endpoints (`3.x`, `52.x`)
   * Happens primarily while **reading metadata files**:
   
     * `metadata/*.json`
     * `manifest-list.avro`
     * `snap-*.avro`
   * Kernel ephemeral ports get exhausted, causing job instability
   
   And socket inspection from inside the executor pod:
   
   ```
   ~43k TIME_WAIT sockets
   Top destinations:
   - 3.5.x.x
   - 52.218.x.x
   ```
   
   ---
   
   ### **Relevant configuration**
   
   ```properties
   spark.sql.catalog.lakehouse.io-impl=org.apache.iceberg.aws.s3.S3FileIO
   
   spark.sql.catalog.lakehouse.http-client.type=apache
   spark.sql.catalog.lakehouse.http-client.apache.max-connections=200
   
spark.sql.catalog.lakehouse.http-client.apache.connection-max-idle-time-ms=300000
   
spark.sql.catalog.lakehouse.http-client.apache.connection-time-to-live-ms=3600000
   
   spark.sql.iceberg.planning.max-threads=4   # reducing to 1 helps but does 
not eliminate
   ```
   
   ---
   
   ### **Why this looks like an Iceberg-level issue**
   
   * The connection explosion correlates with **manifest/metadata access**, not 
data file I/O
   * Planning and rewrite phases appear to trigger **bursty, highly parallel 
small-object GETs**
   * Even with pooling, connections are frequently closed and recreated
   
   This suggests:
   
   * Metadata access patterns may be **too aggressively parallel**
   * Manifest downloads may bypass or defeat effective connection reuse
   * Planning threads / metadata splits may cause connection churn beyond what 
pooling can absorb
   
   ---
   
   ### **Questions / possible directions**
   
   * Is metadata/manifest I/O intentionally parallelized at this level?
   * Are there known issues with connection reuse during manifest reads?
   * Should `planning.max-threads` or metadata split behavior be auto-throttled?
   * Are there additional cache knobs or client reuse guarantees for metadata 
reads?
   * Has similar behavior been observed or addressed in newer versions?
   
   We’re happy to provide:
   
   * Additional logs (with request paths)
   * Repro steps
   * Packet/socket stats
   * A minimal test case if needed
   
   Thanks — this one is pretty brutal in production environments with strict 
networking limits.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Massive TIME_WAIT socket exhaustion during metadata (manifest/avro) reads with S3FileIO + Apache HTTP client [iceberg]

Reply via email to