> -----Original Message----- > From: Richard Fontana <[email protected]> > On Thu, Jun 11, 2026 at 3:37 PM Tim Bird <[email protected]> wrote: > > > > Almost all the files in the fs/nls directory are missing > > SPDX-License-Identifier lines. Add the Unicode-3.0 license to > > LICENSES/preferred, and reference that in the ID lines for > > the pertinent files. > > > > Many of these source files were introduced in 1997 by > > Gordon Chafee, who states that data tables were automatically > > generated from materials on the www.unicode.org web site. > > This pre-dates when that site had an explicit license, and > > these files are missing any license text. > > > > Files starting with 'mac-' prefix were added in 2012 by > > Vladimir Serbinenko. These files have an earlier Unicode license > > with slight differences from the current license that is preferred by > > Unicode, Inc. > > > > Use the current Unicode license (Unicode-3.0) (in conjunction with > > GPL-2.0) for all files that have data that was obtained from Unicode, Inc. > > Use 'GPL-2.0' as the license ID for other files. > > I might be misunderstanding but, at least where there was an explicit > license notice, and assuming the Unicode mapping stuff even requires a > license why not use an SPDX identifier for the legacy Unicode license > (assuming it exists as an SPDX identifier, I haven't actually > checked)? In other words, it seems like you're making a legal > conclusion here that Unicode-3.0 is a valid license for what was > previously either not explicitly licensed or was under a different > Unicode license. I can see how that might be the case, but it doesn't > seem to *necessarily* be the case.
There is an existing SPDX identifier for the text in the fs/nls/mac-*.c files. It can be found here: https://spdx.org/licenses/Unicode-DFS-2015.html There are 4 major categories of files in the fs/nls directory: - files contributed in 1997, that don't have any license text, but do carry a notice that they were derived from www.unicode.org - files contributed in 2012, which have text which exactly matches the 2015 Unicode license. (Presumably it was in place in 2012 when it was copied into kernel files, and no time travel was involved.) - files contributed at other times, which appear to have Unicode, Inc. data in them (most of which carry a notice that they were derived from www.unicode.org) - files contributed at other times, that don't appear to have data from Unicocde, Inc. (e.g. Shift-JIS tables in fs/nls/nls_euc-jp.c) None of the source files in the kernel have changed Unicode data points since 2012. If I was to do some 'clean-room' rebuilding of these files from current data from the Unicode web site (www.unicode.org), I am confident I would get the same data tables that we have in the current code. The '2015' Unicode license is no longer used by Unicode, Inc, and I presume that they would be satisfied if we used their current (Unicode-3.0) license. They removed clause (c) sometime between 2015 and 2016. See: https://spdx.org/licenses/Unicode-DFS-2015.html, https://spdx.org/licenses/Unicode-DFS-2016.html, and https://spdx.org/licenses/Unicode-3.0.html The difference between the 2016 license and the 3.0 license is reordering some items (putting the 'COPYRIGHT AND PERMISSION NOTICE', and the copyright line itself above the license text), and some white space changes (putting clauses a and b inline, instead of as separate bullets). The current Unicode 3.0 license is more permissive than the '2015' license, but more strict than not having a license (what they were doing in 1997). I'd rather not add multiple license files (under LICENSES/preferred) for data that would be equivalent if I rebuilt (in the present) these source files using data from the same organization that they originally came from. IOW, I'd rather treat the 'Unicode-3.0' license as the Unicode, Inc. preferred license, and apply that, rather than carry multiple old license versions around in the kernel source tree. Unicode, Inc. doesn't even do that. My opinion would be different if there were any other copyright holders for the data in question who might have different preferences for the clauses/terms they would like to see in effect (as is the case with BSD variants). But the Unicode Inc. preference in this case seems clear. -- Tim

