Hi Sean,

Thanks for the quick response! Yes, what you described about how LICENSE
file should be distributed makes sense.

The reason I learned about this is that I was trying to build
spark-2.4.5-bin-custom.tgz, then distributes this build to multiple
machines, so that:

1. These machines can run spark with the built.
2. On each machine, I can install pyspark by running `python setup.py
install` inside the python directory.

Step 2 would fail because of missing the licenses directory.

Building pyspark out of a binary distribution is a bit unconventional, but
I did this after failing to do what the official doc recommended (
https://spark.apache.org/docs/latest/building-spark.html#pyspark-pip-installable),
so taking a step back to describe what I did originally:

In the spark-2.4.5 src directory, I just did a simple:

`./build/mvn -DskipTests clean package`


And then went to the python directory and did:


`python setup.py sdist` followed by `pip install dist/pyspark-2.4.5.tar.gz`
(as mentioned in the make-distribution.sh.)


This ran into "error: package directory `deps/jars` does not exist".


However, directly running


`sudo python setup.py install`


worked.



On Fri, May 1, 2020 at 11:30 AM Sean Owen <sro...@gmail.com> wrote:

> The source distribution has the source LICENSE file. The binary
> distribution has the LICENSE-binary license file. The source release isn't
> supposed to have LICENSE-binary as it would not be accurate for that
> release; LICENSE is. If you're redistributing a build, you'll have your own
> process for modifying and building it, including modifying the LICENSE file
> as appropriate; these LICENSE files represent what the project delivers to
> you rather than what you deliver to others. You could get the
> LICENSE-binary file from the right hash commit from git, if desired, as
> part of your build.
>
> On Fri, May 1, 2020 at 10:19 AM Xiangyu Li <yisky...@gmail.com> wrote:
>
>> Hello,
>>
>> I downloaded spark-2.4.5 source from
>> https://mirrors.ocf.berkeley.edu/apache/spark/spark-2.4.5/spark-2.4.5.tgz
>> After extracting it and running:
>>
>> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
>> -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
>>
>>
>> It creates a Spark binary distribution named:
>> spark-2.4.5-bin-custom-spark.tgz
>>
>> So this file is supposedly a ready-to-distribute Spark binary file like
>> the one you can download from
>> http://mirror.metrocast.net/apache/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
>>
>> However, one big difference between this custom build and the official
>> build is that you do not have a LICENSE file in the custom build. I don't
>> know much about Apache license, but I would suppose a custom build
>> distribution should have one.
>>
>> The reason we are missing the file is caused by the following code in
>> make-distribution.sh:
>> [image: image.png]
>>
>> There is no LICENSE-binary file in the official spark-2.4.5.tgz file,
>> therefore there will be no LICENSE file in your custom build.
>>
>> I am aware of two pull requests related to this:
>>
>> https://github.com/apache/spark/pull/22436
>> started to use LICENSE-binary instead of just the LICENSE.
>>
>> And
>> https://github.com/apache/spark/pull/22840
>> To avoid failure when there is no LICENSE-binary in spark-2.4.5 source
>> directory.
>>
>> I think we need to change make-distribution.sh to make sure that the
>> LICENSE file is copied over to its corresponding custom build distribution.
>> However, I am not ready to do a pull request, so hopefully we can discuss
>> it here first.
>> --
>> Sincerely
>> Xiangyu Li
>>
>> <yisky...@gmail.com>
>>
>

-- 
Sincerely
Xiangyu Li

<yisky...@gmail.com>

Reply via email to