TL;DR: Denys, there are patches attached that include Ian's changes and
       many of my own explained below.  I think it's ready to be merged
       unless Ian objects to my additional patches.  If you want an easy
       place to pull this from instead, this should work:

       $ git pull git://gitorious.org/busybox/busybox.git 
bkuhn/spdx-initial-improvements

Full Details:

Ian, thanks for posting your initial pass on automated creation of an
SPDX file for BusyBox.  As some BusyBox developers are aware,
Conservancy has been working with the SPDX Committee to help create an
SPDX file for BusyBox, and I've been asking for years at SPDX committee
meetings for someone to post just such a patch as yours, and therefore I
appreciate that.

Ian Wienand wrote at 18:16 (EDT) on Wednesday:
> This should be considered a good starting point.

I agree completely that your patch is a good starting point.  My main
concern is that it is indeed only a starting point, and we need to make
sure it's represented as such to those who might use the build target.
Indeed, the most important thing about SPDX is that the information
should be verified accurate and vetted by humans, and this data hasn't
been yet.

Specifically, I think there is actually a lot more work that needs to go
into the SPDX file for BusyBox before we could assert its accuracy.  We
know the overall license of BusyBox is GPLv2-only, but collecting
detailed file-by-file copyright information is a much more complex task
than your current script undertakes.  Your patch seems to mostly collect
the copyright notices from individual files, which are important and
relevant, but not dispositive to express complete copyright holdership
information.  As with most projects, the best information about the
copyright holdership of BusyBox is of course the Git log and the SVN log
before it.

BTW, I've done some preliminary work to improve details and parse some
of that log data, and Conservancy was actually hoping in the next few
months to turn our attention to it more readily, which I hope we'll be
able to do.  (Such is entirely funding-dependent, of course, since
Conservancy is a very small non-profit org.)

> If there is interest, we could do more work to tag individual source
> files and have the generated SPDX be even more descriptive.

Yes, I think Conservancy seeks to help in some of this work.

Anyway, regarding your patch itself, I have a few concerns, which I've
addressed in the attached patch-set:

 * My overarching concern is that having this accepted upstream would
   cause those finding the "spdx" target in the Makefile to think that
   meant the output they got was an SPDX file vetted by the BusyBox
   project and/or Conservancy (BusyBox's non-profit home).  Someday, I
   hope there to be an SPDX file that Conservancy and/or BusyBox as a
   project can endorse, but at the moment we're just beginning, so we
   need to be abundantly clear that this is a work-in-progress.

   My suggested solution is to call the Makefile target
   "spdx-experimental" (I'm open to something else, as long as it is
   abundantly clear that it's not official yet).  One of my patches
   attached that does this.

 * COPYRIGHT.spdx should currently be gitignore'd, if we're going to be
   auto-generated it for now.  I have a patch below that does that.

 * I think it's not fitting with SPDX's spec definition of "Concluded
   License" for an automated script to add "Concluded" fields without
   human intervention.  I have a patch attached that deals with that by
   not making conclusions the default, but leaves it as an option (i.e.,
   to save typing if someone is making such conclusions).


So, Denys, unless Ian objects to anything above, I think we're ready to
have the patches below merged, if you'd like to apply the attached.
I've included a full patch set, which also includes Ian's original patch
as part of it, and there's git pull information at the top of this
email.

>From 5a86aa76d5597e242e2b853de5ab11a1c4579323 Mon Sep 17 00:00:00 2001
From: Ian Wienand <[email protected]>
Date: Wed, 19 Sep 2012 15:16:11 -0700
Subject: [PATCH 1/5] Add SPDX generation target

Hi,

SPDX [1] is a machine-readable specification for copyright
information.  I dare say any one of us developers who have to deal
with corporate legal departments is probably very happy with the idea
of standardised and machine readable copyright information which can
be integrated into build infrastructures and verification tools.  If
the SPDX presentation is to be believed, there is broad interest in
the format [2].  Given busybox must be one of the most passed-around
GPL components, it seems to make sense to try and support the effort.

This patch adds a "make spdx" target which creates a COPYRIGHT.spdx
file in tag/value format as per the SPDX 1.1 spec [3].  By way of
validation, I have run the generated output through the tag-to-RDF
generator provided by the SPDX tools distribution [4] and it works.

This should be considered a good starting point.  For example, I know
several parts of busybox have been pulled out of other projects
originally, and the SPDX file-format has ways of specifying this
detailed information.  If there is interest, we could do more work to
tag individual source files and have the generated SPDX be even more
descriptive.

Thanks,

-i

[1] http://www.spdx.org
[2] http://www.spdx.org/system/files/spdx_slides_v2_9.ppt
[3] http://www.spdx.org/spec
[4] http://www.spdx.org/content/tools

Signed-off-by: Ian Wienand <[email protected]>
---
 Makefile.custom                 |    5 ++
 scripts/COPYRIGHT.spdx.template |   26 +++++++++++
 scripts/create-spdx             |   91 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+), 0 deletions(-)
 create mode 100644 scripts/COPYRIGHT.spdx.template
 create mode 100755 scripts/create-spdx

diff --git a/Makefile.custom b/Makefile.custom
index 6da79e6..8c8abf4 100644
--- a/Makefile.custom
+++ b/Makefile.custom
@@ -109,6 +109,11 @@ bigdata: busybox_unstripped
 .PHONY: doc
 doc: docs/busybox.pod docs/BusyBox.txt docs/busybox.1 docs/BusyBox.html
 
+# Something like a SPDX file
+.PHONY: spdx
+spdx: 
+	@$(srctree)/scripts/create-spdx $(srctree) $(objtree)/COPYRIGHT.spdx
+
 # FIXME: Doesn't belong here
        cmd_doc =
  quiet_cmd_doc = $(Q)echo "  DOC     $(@F)"
diff --git a/scripts/COPYRIGHT.spdx.template b/scripts/COPYRIGHT.spdx.template
new file mode 100644
index 0000000..f36a1f7
--- /dev/null
+++ b/scripts/COPYRIGHT.spdx.template
@@ -0,0 +1,26 @@
+SPDXVersion: SPDX-1.1
+DataLicense: CC0-1.0
+
+##
+##  Busybox SPDX Copyright Info
+##
+
+## Creation Information
+Creator: Tool: Busybox
+Created: %CREATED_TIMESTAMP%
+CreatorComment: <text>Generated by Busybox build infrastructure</text>
+
+## Package Information
+PackageName: Busybox
+PackageVersion: %BUSYBOX_VERSION%
+PackageDescription: <text>BusyBox: The Swiss Army Knife of Embedded Linux</text>
+PackageDownloadLocation: git://busybox.net/busybox.git
+PackageVerifcationCode: %PACKAGE_VERIFICATION_CODE%
+
+PackageCopyrightText: <text>BusyBox is copyrighted by many authors between 1998-2012
+Licensed under GPLv2. See source distribution for detailed copyright notices</text>
+
+PackageLicenseDeclared: GPL-2.0
+PackageLicenseConcluded: GPL-2.0
+
+PackageLicenseInfoFromFiles: GPL-2.0
diff --git a/scripts/create-spdx b/scripts/create-spdx
new file mode 100755
index 0000000..ffa6a97
--- /dev/null
+++ b/scripts/create-spdx
@@ -0,0 +1,91 @@
+#!/bin/bash
+
+# Generate a SPDX tag/value file for busybox source files.
+
+# For more information on the SPDX file format, including downloads
+# for the tools to turn the output into RDF, etc, see:
+#  http://www.spdx.org/
+
+if [ $# -ne 2 ]; then
+    echo "usage: create-spdx src-tree output-file"
+    echo "  tool will create template COPYRIGHT.spdx in output-file"
+    exit 1
+fi
+
+SRC_DIR="$1/"
+DEST_FILE=$2
+
+# Get the verison
+if [ ! -f "${SRC_DIR}/Makefile" ]; then
+    echo "Can't find top-level makefile"
+    exit 1
+fi
+BUSYBOX_VERSION=$(egrep '^VERSION =|^PATCHLEVEL =|^SUBLEVEL =' "${SRC_DIR}/Makefile" | sed 's/^.*= //' | tr '\n' '.')
+BUSYBOX_VERSION=${BUSYBOX_VERSION%?}
+echo "** Determined version ${BUSYBOX_VERSION}"
+
+# copy template into objdir
+if [ ! -f "${SRC_DIR}/scripts/COPYRIGHT.spdx.template" ]; then
+    echo "Can't find SPDX template"
+    exit 1
+fi
+echo "** Creating ${DEST_FILE}"
+cp "${SRC_DIR}/scripts/COPYRIGHT.spdx.template" "${DEST_FILE}"
+
+# replace template strings
+sed "s/%BUSYBOX_VERSION%/${BUSYBOX_VERSION}/" "${DEST_FILE}" > "${DEST_FILE}.tmp"
+mv "${DEST_FILE}.tmp" "${DEST_FILE}"
+
+DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+sed "s/%CREATED_TIMESTAMP%/${DATE}/" "${DEST_FILE}" > "${DEST_FILE}.tmp"
+mv "${DEST_FILE}.tmp" "${DEST_FILE}"
+
+# make a list of all likely source files
+SRC_FILES=$(find ${SRC_DIR} -type f -name '*.c' -print0 | xargs --null)
+
+# Now output some info for each source file.  There are additional
+# things we could do here; such as marking the "ArtifactOfProjectName"
+# for the various bits that have come from other open source projects,
+# or getting more copyright info, etc.  Possibly in the future source
+# files could have this info pre-tagged and we just concatenate it in
+# this step.
+
+echo -e "\n\n# autogenerated file info\n\n" >> "${DEST_FILE}"
+
+for f in ${SRC_FILES}
+do
+
+    echo "** Processing : ${f#SRC_DIR}"
+
+    chksum=$(sha1sum ${f} | awk '{print $1}')
+
+    # usually this is in a C comment and has *'s prepended, so just
+    # strip off anything before "Copyright"
+    copyright=$(grep 'Copyright' ${f} | sed 's/^.*Copyright/Copyright/')
+
+    echo "FileName: ${f#$SRC_DIR}" >> "${DEST_FILE}"
+    echo "FileType: SOURCE" >> "${DEST_FILE}"
+    echo "FileChecksum: SHA1: ${chksum}" >> "${DEST_FILE}"
+    echo "LicenseConcluded: GPL-2.0" >> "${DEST_FILE}"
+    echo "LicenseInfoInFile: NOASSERTION" >> "${DEST_FILE}"
+    if [ -n "${copyright}" ]; then
+        echo "FileCopyrightText: <text>${copyright}</text>" >> "${DEST_FILE}"
+    else
+        echo "FileCopyrightText: NONE" >> "${DEST_FILE}"
+    fi
+    echo  >> "${DEST_FILE}"
+
+done
+
+# the algorithm in the spec for "package verification" is to take all
+# the sha1 sums of the files, sort them, remove the newlines then take
+# the sha1 hash of that
+echo "** Creating verification hash"
+VER_FILE=$(tempfile)
+grep '^FileChecksum' "${DEST_FILE}" | awk '{print $3}' | sort | tr -d '\n' > "${VER_FILE}"
+VER_HASH=$(sha1sum "${VER_FILE}" | awk '{print $1}')
+rm -f "${VER_FILE}"
+sed "s/%PACKAGE_VERIFICATION_CODE%/${VER_HASH}/" "${DEST_FILE}" > "${DEST_FILE}.tmp"
+mv "${DEST_FILE}.tmp" "${DEST_FILE}"
+
+echo "** Done!"
\ No newline at end of file
-- 
1.7.2.5

>From b04a10744300ee67a3e43b7d6949a8a5e04ff784 Mon Sep 17 00:00:00 2001
From: Bradley M. Kuhn <[email protected]>
Date: Fri, 21 Sep 2012 11:43:56 -0400
Subject: [PATCH 2/5] SPDX file is currently an experimental feature & generated SPDX file is probably not accurate.

Since this feature is very much new, the SPDX file generated by this
process probably has some errors and shouldn't be relied upon by anyone
yet.  This must be made clear to those who use it in various different
ways, as done herein.

Signed-off-by: Bradley M. Kuhn <[email protected]>
---
 Makefile.custom                 |    7 ++++---
 scripts/COPYRIGHT.spdx.template |    2 +-
 scripts/create-spdx             |    3 +++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Makefile.custom b/Makefile.custom
index 8c8abf4..d152313 100644
--- a/Makefile.custom
+++ b/Makefile.custom
@@ -109,9 +109,10 @@ bigdata: busybox_unstripped
 .PHONY: doc
 doc: docs/busybox.pod docs/BusyBox.txt docs/busybox.1 docs/BusyBox.html
 
-# Something like a SPDX file
-.PHONY: spdx
-spdx: 
+# Create the experimental SPDX file for BusyBox
+# The generated SPDX file is experimental is *not* known to be accurate currently, and should not be relied upon yet.
+.PHONY: spdx-experimental
+spdx-experimental: 
 	@$(srctree)/scripts/create-spdx $(srctree) $(objtree)/COPYRIGHT.spdx
 
 # FIXME: Doesn't belong here
diff --git a/scripts/COPYRIGHT.spdx.template b/scripts/COPYRIGHT.spdx.template
index f36a1f7..659d5c8 100644
--- a/scripts/COPYRIGHT.spdx.template
+++ b/scripts/COPYRIGHT.spdx.template
@@ -8,7 +8,7 @@ DataLicense: CC0-1.0
 ## Creation Information
 Creator: Tool: Busybox
 Created: %CREATED_TIMESTAMP%
-CreatorComment: <text>Generated by Busybox build infrastructure</text>
+CreatorComment: <text>Generated by Busybox build infrastructure.  This auto-generated SPDX file is EXPERIMENTAL and information herein should not be relied upon by anyone.</text>
 
 ## Package Information
 PackageName: Busybox
diff --git a/scripts/create-spdx b/scripts/create-spdx
index ffa6a97..b34d72c 100755
--- a/scripts/create-spdx
+++ b/scripts/create-spdx
@@ -2,6 +2,9 @@
 
 # Generate a SPDX tag/value file for busybox source files.
 
+# NOTE: the generated SPDX file is experimental is *not* known to be
+# accurate currently, and should not be relied upon yet.
+
 # For more information on the SPDX file format, including downloads
 # for the tools to turn the output into RDF, etc, see:
 #  http://www.spdx.org/
-- 
1.7.2.5

>From 415932ced6ec6a7d8eccebd7b1e0f54191a12c42 Mon Sep 17 00:00:00 2001
From: Bradley M. Kuhn <[email protected]>
Date: Fri, 21 Sep 2012 11:45:37 -0400
Subject: [PATCH 3/5] COPYRIGHT.spdx is currently a generated file and should be gitignore'd.

Note that eventually, COPYRIGHT.spdx may contain data that is composed by
hand, in which case it should be removed from the .gitignore.  However,
for the moment, we're focused on making changes to COPYRIGHT.spdx.template
and auto-generating from that.

Signed-off-by: Bradley M. Kuhn <[email protected]>
---
 .gitignore |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index 0a0c65b..5bda738 100644
--- a/.gitignore
+++ b/.gitignore
@@ -35,3 +35,4 @@ Config.in
 core
 .gdb_history
 .gdbinit
+COPYRIGHT.spdx
-- 
1.7.2.5

>From 4d256438745a2cadd58848d5df37625e655fab1e Mon Sep 17 00:00:00 2001
From: Bradley M. Kuhn <[email protected]>
Date: Fri, 21 Sep 2012 12:18:51 -0400
Subject: [PATCH 4/5] SPDX specification indicates "Concluded License" is a human decision.

To quote from the SPDX 1.0 specification:

  Here, the intent is for the SPDX file creator to analyze the license
  information in package, and other objective information ... to arrive at
  a reasonably objective conclusion as to what license governs the
  package.

This script is clearly not sophisticated enough to perform this kind of
analysis, and the other text found in the SPDX specification indicates
pretty clearly that all "Concluded" fields are supposed to involve human
interpretation.  Therefore, the script should certainly not by default
create Concluded fields.

Theoretically, if someone has done this analysis, it might be nice if the
script "tries to be helpful" by auto-generating the fields to save typing
or the like.  Therefore, generation of such fields is left in as an
option, but the option is *not* called by default from Makefile.custom

Signed-off-by: Bradley M. Kuhn <[email protected]>
---
 scripts/COPYRIGHT.spdx.template |    3 +-
 scripts/create-spdx             |   65 +++++++++++++++++++++++++++++++++++---
 2 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/scripts/COPYRIGHT.spdx.template b/scripts/COPYRIGHT.spdx.template
index 659d5c8..1b942bd 100644
--- a/scripts/COPYRIGHT.spdx.template
+++ b/scripts/COPYRIGHT.spdx.template
@@ -21,6 +21,7 @@ PackageCopyrightText: <text>BusyBox is copyrighted by many authors between 1998-
 Licensed under GPLv2. See source distribution for detailed copyright notices</text>
 
 PackageLicenseDeclared: GPL-2.0
-PackageLicenseConcluded: GPL-2.0
+
+%PACKAGE_LICENSE_CONCLUSION%
 
 PackageLicenseInfoFromFiles: GPL-2.0
diff --git a/scripts/create-spdx b/scripts/create-spdx
index b34d72c..f0f159a 100755
--- a/scripts/create-spdx
+++ b/scripts/create-spdx
@@ -5,19 +5,61 @@
 # NOTE: the generated SPDX file is experimental is *not* known to be
 # accurate currently, and should not be relied upon yet.
 
+# Portions of this file  Copyright (C) 2012 Bradley M. Kuhn <[email protected]>
+# Kuhn's copyrights are licensed GPLv2-or-later.
+
 # For more information on the SPDX file format, including downloads
 # for the tools to turn the output into RDF, etc, see:
 #  http://www.spdx.org/
 
-if [ $# -ne 2 ]; then
-    echo "usage: create-spdx src-tree output-file"
-    echo "  tool will create template COPYRIGHT.spdx in output-file"
-    exit 1
-fi
+usage()
+{
+cat << EOF
+usage: $0 [-h] [-c] src-tree output-file
+
+automatically create an experimental SPDX for BusyBox file in output-file
+
+OPTIONS:
+   -h      Print this help and exit.
+
+   -c      Pre-add "Concluded" fields.  NOTE: USE THIS OPTION WITH CARE!
+           This asserts that the conclusions about licenses have been checked by a human! 
+EOF
+}
+
+ADD_CONCLUDE_FIELDS=
+
+while getopts "h:c" OPTION
+do
+     case $OPTION in
+         h)
+             usage
+             exit 1
+             ;;
+         c)
+             ADD_CONCLUDE_FIELDS=1
+             ;;
+         ?)
+             usage
+             exit 1
+             ;;
+     esac
+done
+shift $(($OPTIND - 1))
 
 SRC_DIR="$1/"
 DEST_FILE=$2
 
+if [ -z $SRC_DIR ]  || [ -z $DEST_FILE ] || [ $# -ne 2 ];
+then
+     usage
+     exit 1
+fi
+if [ $# -ne 2 ]; then
+    usage
+    exit 1
+fi
+
 # Get the verison
 if [ ! -f "${SRC_DIR}/Makefile" ]; then
     echo "Can't find top-level makefile"
@@ -43,9 +85,18 @@ DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
 sed "s/%CREATED_TIMESTAMP%/${DATE}/" "${DEST_FILE}" > "${DEST_FILE}.tmp"
 mv "${DEST_FILE}.tmp" "${DEST_FILE}"
 
+if [ ! -z "$ADD_CONCLUDE_FIELDS" ]; then
+    PACKAGE_LICENSE_CONCLUSION="PackageLicenseConcluded: GPL-2.0"
+else
+    PACKAGE_LICENSE_CONCLUSION=""
+fi
+sed "s/%PACKAGE_LICENSE_CONCLUSION%/${PACKAGE_LICENSE_CONCLUSION}/" "${DEST_FILE}" > "${DEST_FILE}.tmp"
+mv "${DEST_FILE}.tmp" "${DEST_FILE}"
+
 # make a list of all likely source files
 SRC_FILES=$(find ${SRC_DIR} -type f -name '*.c' -print0 | xargs --null)
 
+
 # Now output some info for each source file.  There are additional
 # things we could do here; such as marking the "ArtifactOfProjectName"
 # for the various bits that have come from other open source projects,
@@ -69,7 +120,9 @@ do
     echo "FileName: ${f#$SRC_DIR}" >> "${DEST_FILE}"
     echo "FileType: SOURCE" >> "${DEST_FILE}"
     echo "FileChecksum: SHA1: ${chksum}" >> "${DEST_FILE}"
-    echo "LicenseConcluded: GPL-2.0" >> "${DEST_FILE}"
+    if [ ! -z "$ADD_CONCLUDE_FIELDS" ]; then
+         echo "LicenseConcluded: GPL-2.0" >> "${DEST_FILE}"
+    fi
     echo "LicenseInfoInFile: NOASSERTION" >> "${DEST_FILE}"
     if [ -n "${copyright}" ]; then
         echo "FileCopyrightText: <text>${copyright}</text>" >> "${DEST_FILE}"
-- 
1.7.2.5

>From eef3ec026579353ce65cb4bb170c06a466972529 Mon Sep 17 00:00:00 2001
From: Bradley M. Kuhn <[email protected]>
Date: Fri, 21 Sep 2012 12:22:56 -0400
Subject: [PATCH 5/5] Affix my copyright notices to the files that I've changed in previous commits.

Signed-off-by: Bradley M. Kuhn <[email protected]>
---
 Makefile.custom     |    3 +++
 scripts/create-spdx |    2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/Makefile.custom b/Makefile.custom
index d152313..427efc7 100644
--- a/Makefile.custom
+++ b/Makefile.custom
@@ -1,3 +1,6 @@
+# Some small portions of this file are Copyright (C) 2012 Bradley M. Kuhn <[email protected]>
+# Kuhn's copyrights are licensed GPLv2-or-later.
+
 # ==========================================================================
 # Build system
 # ==========================================================================
diff --git a/scripts/create-spdx b/scripts/create-spdx
index f0f159a..cb3e5b5 100755
--- a/scripts/create-spdx
+++ b/scripts/create-spdx
@@ -5,7 +5,7 @@
 # NOTE: the generated SPDX file is experimental is *not* known to be
 # accurate currently, and should not be relied upon yet.
 
-# Portions of this file  Copyright (C) 2012 Bradley M. Kuhn <[email protected]>
+# Some portions of this file are Copyright (C) 2012 Bradley M. Kuhn <[email protected]>
 # Kuhn's copyrights are licensed GPLv2-or-later.
 
 # For more information on the SPDX file format, including downloads
-- 
1.7.2.5

-- 
   -- bkuhn
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to