Dear all, I am currently working on the web application enabling to browse the Software Heritage archive (https://archive.softwareheritage.org). I wanted to add LibreJS compliance for this web application but as I am using webpack to generate the JavaScript assets, I had a doubt whether it was feasible or not.
Using a tool like webpack is quite common nowadays as it really simplify frontend web development, notably: - it enables to organize frontend code into logical modules - it can generate multiple JavaScript assets bundling multiple source files in them - it can consume any type of file format that get compiled to JavaScript (for instance Typescript or CoffeeScript) - it can transpile JavaScript code written in ES6 standard to ES5 one So when using a tool like webpack, each JavaScript asset generated with it consists in a file bundling multiple source files (which can be minified for production use). Those bundles usually contain the content of JavaScript source files retrieved from the npm registry using a package manager like npm or yarn. The licenses of those source files are usually compatible with those allowed by LibreJS (MIT/Expat being the most used in the JavaScript ecosystem). One of the strength of webpack is that it enables to write plugins to perform a variety of tasks related to frontend development. Notably, it is possible to add extra processing after a whole webpack compilation has been executed. Such a plugin receives the webpack compilation statistics as input. Those statistics are of interest here as it is quite easy to extract from them the list of all bundled source files in each generated JavaScript assets. When reading the current LibreJS documentation, Section 7.1.1 "Specifying multiple license files for a single JavaScript file" rings a bell to me. As it is quite easy to find the license(s) associated to each JavaScript source file retrieved through npm/yarn (by parsing the SPDX license expression in the associated package.json file), I thought writing a webpack plugin that automatically generates a Web Labels page to be consumed by LibreJS could be an interesting solution here. If I understand correctly, in order to be compliant with the LibreJS specifications, the Web Labels page must contain the following information. For each JavaScript asset generated by webpack, all bundled source files in it need to be referenced along with their license(s) but also a link to their non-minified source code. So I have implemented a webpack plugin processing the statistics available after the whole webpack compilation that currently does the following: - For each bundled JavaScript source file, it will try to find its associated LibreJS compatible license by parsing the corresponding SPDX License Expression (https://spdx.org/licenses/), usually located in `package.json` files - It copies all original source files (non-minified) bundled into the generated JavaScript assets, into a directory located into the webpack output folder. Also if a license file can be found for a source file it will also be copied. - It generates either a Web Labels HTML page named `jslicenses.html` or a JSON file named `jslicenses.json`, into a directory located into the webpack output folder. The JSON file should then be used with an HTML template engine to generate the Web Labels page. You can find the detail of its implementation here [1] and the resulting generated Web Labels table here [2]. The good news is that LibreJS successfully detected the licenses of bundled source files and do not block the loading of webpack generated JavaScript anymore. However, I have the feeling that LibreJS specifications should better handle the "multiple licenses for a single JavaScript file" case. For instance, if a compatible license can not be found for a particular bundled source file, current LibreJS will not block the loading of the associated JavaScript asset if there at least one compatible license available in the Web Labels table. In other words, if any of the bundled source files do not have any license information in the generated Web Labels, corresponding webpack generated JavaScript asset loading should be blocked. Maybe the following update to Web Labels specifications could work. In case of a JavaScript asset bundling multiple source files, the content of the Licenses and Sources columns could be turned into tables. Both tables should have the exact same number of rows in order to precisely match for each bundled source file its corresponding license. If a license is missing, the asset loading should then be blocked. Another issue I stumbled across when implementing and testing the plugin is the multiple licenses combination case. SPDX license expressions allow complex licenses combination using AND/OR keyword. For instance, the following is a valid SPDX license expression: (LGPL-2.1 OR (BSD-3-Clause AND MIT)). It is not clear how LibreJS can handle this at the moment. However, such cases are pretty rare. Apart the issues mentionned above, the webpack plugin should work great on a majority of projects. Currently, its implementation lives in a dedicated folder of the swh-web repository but I intend to create a dedicated one for it and publish it on npm as other webpack users could benefit from it. What do you think of the proposed approach ? I will be happy to go further on the subject and keep its implementation synchronized with any LibreJS specifications update. Best regards, Antoine Lambert [1] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/ [2] https://archive.softwareheritage.org/jslicenses/
