Hi Charles, On 2019-06-13 13:11, Charles Plessy wrote: >> 1. Free datasets used to train FreeModel are not required to upload >> to our main section, for example those Osamu mentioned and wikipedia >> dump. We are not scientific data archiving organization and these >> data will blow up our infra if we upload too much. > > how about storing only the data used to train the version that is > released in Stable, and keeping this data in a dedicated archive, to > avoid bloating mirrors ? There was a thread on debian-project on how to > use Debian money, and I think that it could be a useful case.
This idea could be mentioned in DL-Policy for future reference. However I don't see the necessity for the dedicated archive in the near future. When there are enough amount of DL models in our archive, we can recall this idea and discuss again. > For the versions in Unstable and Testing, the role of the package > maintainer would be to ensure that the data is still available for > download. Plus, we can create a new tag "Failed To Train From Scratch" (FTTFS) similar to the FTBFS tag we use. For models in the main section FTTFS is unacceptable. >> 2. It's not required to re-train a FreeModel with our infra, because >> the outcome/cost ratio is impractical. The outcome is nearly zero >> compared to directly using a pre-trained FreeModel, while the cost >> is increased carbon dioxide in our atmosphere and wasted developer >> time. (Deep learning is producing much more carbon dioxide than we >> thought). > > Optionally, we could even consider re-training the release candidate at > the approach of the Freeze, for the sake of demonstrating that the > training process functions well. > > Stable point update might not need to be retrained depending on what the > patches address. That's a good idea! I didn't even thought about how DL-Policy works with our release schedule. Thanks, and I'll merge this point to the document soon. Thanks, Mo

