kevinjqliu opened a new pull request, #16215:
URL: https://github.com/apache/iceberg/pull/16215

   Fix LICENSE and NOTICE compliance for all spark-runtime shadow JARs (v3.4, 
v3.5, v4.0, v4.1) to accurately represent bundled contents per [ASF licensing 
policy](https://infra.apache.org/licensing-howto.html).
   
   Audit of the shadow JAR contents revealed several Category B dependencies 
with missing full license texts, missing NOTICE propagation, and undeclared 
Apache-licensed transitive dependencies.
   
   ## Build and verify
   
   ```bash
   # Build shadow JARs (all versions)
   ./gradlew -DsparkVersions=3.4,3.5,4.0,4.1 \
             :iceberg-spark:iceberg-spark-runtime-3.4_2.12:shadowJar \
             :iceberg-spark:iceberg-spark-runtime-3.5_2.12:shadowJar \
             :iceberg-spark:iceberg-spark-runtime-4.0_2.13:shadowJar \
             :iceberg-spark:iceberg-spark-runtime-4.1_2.13:shadowJar -x test
   ```
   
   ---
   
   ## LICENSE changes
   
   All four versions (v3.4, v3.5, v4.0, v4.1) receive the same set of additions 
unless noted.
   
   - **FastDoubleParser (MIT)** — **Required.** Shaded into Jackson Core at 
`com/fasterxml/jackson/core/io/doubleparser/`. Category B license requires full 
text per [resolved.html](https://www.apache.org/legal/resolved.html).
     ```bash
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep FastDouble
     # META-INF/FastDoubleParser-LICENSE
     # META-INF/FastDoubleParser-NOTICE
     # 
org/apache/iceberg/shaded/com/fasterxml/jackson/core/io/doubleparser/FastDoubleMath.class
     ```
     Upstream: 
https://github.com/wrandelshofer/FastDoubleParser/blob/main/LICENSE
   
   - **fast_float (MIT, bundled by FastDoubleParser)** — **Required.** 
Transitively included. MIT license full text required.
     Upstream: https://github.com/fastfloat/fast_float/blob/main/LICENSE-MIT
   
   - **bigint (BSD 2-Clause, bundled by FastDoubleParser)** — **Required.** 
Transitively included. BSD license full text required.
     Upstream: https://github.com/tbuktu/bigint/blob/master/LICENSE
   
   - **JCTools (Apache 2.0, via Netty)** — Not strictly required 
(Apache-licensed) but declared for completeness, consistent with 
aws-bundle/gcp-bundle convention. 136 classes shaded at 
`io/netty/util/internal/shaded/org/jctools/`.
     ```bash
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep -c jctools  # 136
     ```
     Upstream: https://github.com/JCTools/JCTools/blob/master/LICENSE
   
   - **Mozilla Public Suffix List (MPL 2.0, via Apache HttpComponents)** — 
**Required.** Category B license requires full text. Data file embedded at 
`org/publicsuffix/list/effective_tld_names.dat`.
     ```bash
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep publicsuffix
     # org/publicsuffix/list/effective_tld_names.dat
     ```
     Upstream: https://mozilla.org/MPL/2.0/
   
   - **Eclipse Collections — full EPL-1.0 + EDL-1.0 text added** (all versions) 
— **Required.** Previously only referenced by URL. Category B licenses require 
full text in LICENSE. 5,664 classes at `org/eclipse/collections/`.
     ```bash
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep -c "org/eclipse/collections"  # 5664
     ```
     Upstream: 
https://github.com/eclipse/eclipse-collections/blob/master/LICENSE-EPL-1.0.txt
   
   - **JTS Topology Suite — full EPL-2.0 text added** (all versions) — 
**Required.** Previously only referenced by URL. Category B license requires 
full text. 795 classes at `org/locationtech/jts/`.
     ```bash
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep -c "org/locationtech/jts"  # 795
     ```
     Upstream: https://github.com/locationtech/jts/blob/master/LICENSE_EPL
   
   ### v4.1 only: reorder entries
   
   Existing entries for Project Nessie, Eclipse MicroProfile OpenAPI, Eclipse 
Collections, Apache Datasketches, and JTS were reordered to group 
Apache-licensed entries together, followed by Category B entries with full 
license texts.
   
   ---
   
   ## NOTICE changes
   
   All four versions receive the same additions:
   
   - **Jackson JSON Processor NOTICE** — **Required.** Propagation of upstream 
NOTICE per ASF policy. Includes FastDoubleParser copyright attribution.
     ```bash
     # Jackson is bundled in the JAR (2326 entries)
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep -c "com/fasterxml/jackson"
     ```
     Upstream NOTICE: https://github.com/FasterXML/jackson-core/blob/2.x/NOTICE
   
   - **Apache DataSketches NOTICE** — **Required.** Upstream NOTICE contents 
must be reproduced. DataSketches is an Apache project and ships a NOTICE file. 
Previously missing from all versions.
     ```bash
     # DataSketches is bundled in the JAR (534 entries)
     jar tf 
spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar
 | grep -c "datasketches"
     ```
     Upstream NOTICE: 
https://github.com/apache/datasketches-java/blob/master/NOTICE
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to