Hm, the build fails? you can see this is just skipped if not present, for
this reason.
I'm not clear why you need the file for its own sake, for your own internal
modification that you don't redistribute.



On Fri, May 1, 2020 at 11:43 AM Xiangyu Li <yisky...@gmail.com> wrote:

> Hi Sean,
>
> Thanks for the quick response! Yes, what you described about how LICENSE
> file should be distributed makes sense.
>
> The reason I learned about this is that I was trying to build
> spark-2.4.5-bin-custom.tgz, then distributes this build to multiple
> machines, so that:
>
> 1. These machines can run spark with the built.
> 2. On each machine, I can install pyspark by running `python setup.py
> install` inside the python directory.
>
> Step 2 would fail because of missing the licenses directory.
>
> Building pyspark out of a binary distribution is a bit unconventional, but
> I did this after failing to do what the official doc recommended (
> https://spark.apache.org/docs/latest/building-spark.html#pyspark-pip-installable),
> so taking a step back to describe what I did originally:
>
> In the spark-2.4.5 src directory, I just did a simple:
>
> `./build/mvn -DskipTests clean package`
>
>
> And then went to the python directory and did:
>
>
> `python setup.py sdist` followed by `pip install
> dist/pyspark-2.4.5.tar.gz` (as mentioned in the make-distribution.sh.)
>
>
> This ran into "error: package directory `deps/jars` does not exist".
>
>
> However, directly running
>
>
> `sudo python setup.py install`
>
>
> worked.
>
>
>
> On Fri, May 1, 2020 at 11:30 AM Sean Owen <sro...@gmail.com> wrote:
>
>> The source distribution has the source LICENSE file. The binary
>> distribution has the LICENSE-binary license file. The source release isn't
>> supposed to have LICENSE-binary as it would not be accurate for that
>> release; LICENSE is. If you're redistributing a build, you'll have your own
>> process for modifying and building it, including modifying the LICENSE file
>> as appropriate; these LICENSE files represent what the project delivers to
>> you rather than what you deliver to others. You could get the
>> LICENSE-binary file from the right hash commit from git, if desired, as
>> part of your build.
>>
>> On Fri, May 1, 2020 at 10:19 AM Xiangyu Li <yisky...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I downloaded spark-2.4.5 source from
>>> https://mirrors.ocf.berkeley.edu/apache/spark/spark-2.4.5/spark-2.4.5.tgz
>>> After extracting it and running:
>>>
>>> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
>>> -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
>>>
>>>
>>> It creates a Spark binary distribution named:
>>> spark-2.4.5-bin-custom-spark.tgz
>>>
>>> So this file is supposedly a ready-to-distribute Spark binary file like
>>> the one you can download from
>>> http://mirror.metrocast.net/apache/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
>>>
>>> However, one big difference between this custom build and the official
>>> build is that you do not have a LICENSE file in the custom build. I don't
>>> know much about Apache license, but I would suppose a custom build
>>> distribution should have one.
>>>
>>> The reason we are missing the file is caused by the following code in
>>> make-distribution.sh:
>>> [image: image.png]
>>>
>>> There is no LICENSE-binary file in the official spark-2.4.5.tgz file,
>>> therefore there will be no LICENSE file in your custom build.
>>>
>>> I am aware of two pull requests related to this:
>>>
>>> https://github.com/apache/spark/pull/22436
>>> started to use LICENSE-binary instead of just the LICENSE.
>>>
>>> And
>>> https://github.com/apache/spark/pull/22840
>>> To avoid failure when there is no LICENSE-binary in spark-2.4.5 source
>>> directory.
>>>
>>> I think we need to change make-distribution.sh to make sure that the
>>> LICENSE file is copied over to its corresponding custom build distribution.
>>> However, I am not ready to do a pull request, so hopefully we can discuss
>>> it here first.
>>> --
>>> Sincerely
>>> Xiangyu Li
>>>
>>> <yisky...@gmail.com>
>>>
>>
>
> --
> Sincerely
> Xiangyu Li
>
> <yisky...@gmail.com>
>

Reply via email to