On 10/20/2019 05:44, Michał Górny wrote:
> On Sun, 2019-10-20 at 05:21 -0400, Joshua Kinard wrote:
>> On 10/20/2019 04:32, Michał Górny wrote:
>>> On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote:
>>>> Why is having a max ~24k files in a directory a bad idea?  Modern
>>>> filesystems are more than capable of handling that.
>>>>
>>>>   - ext4: unlimited files in a directory
>>>>   - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume)
>>>>   - ntfs: 4,294,967,295
>>>>
>>>> And 24k is a bit more than 1/3rd of all distfiles that we currently have.
>>>
>>> For the same reason having ~60k files in a directory was a problem. 
>>> There is really no point in changing anything if you change BIG_NUMBER
>>> to SMALLER_BIG_NUMBER.
>>
>> That doesn't answer my question.  Why is it a problem?  What criteria are
>> you using to decide that 24k is a "smaller big number"?  Is there some issue
>> highlighted by the mirror admins where having 24k files in a single
>> directory offers no significant relief versus the current 60k files?
> 
> IIRC Robin set the goal as:
> 
> | the number of files in a single directory should not exceed 1000, [1]
> 
> I don't recall how that number was chosen but it's probably pretty
> arbitrary.  In any case, I can notice the difference between working
> with a listing of 1k files and 24k files, on the hardware running
> masterdist.

I think it would be prudent then to get some data to help underpin why that
number was chosen and add that to the GLEP, possibly as one of the
references at the bottom.  Your personal observations of a system
(masterdist) that few of us have access to is not good enough, especially
for future developers who may revisit this topic long after you or I are gone.


> 
>>>> Under which scenario do you wind up with 24k files in a single directory?  
>>>> I
>>>> consider the tex package an outlier in this case (one package should not be
>>>> the sole dictator of policy).
>>>
>>> Three versions of TeXLive living simultaneously.  If one package falls
>>> completely out of bounds, no problem is solved by the change, so what's
>>> the point of making it?
>>
>> The problem in this case is with texlive, not our current, or future,
>> distfiles methodology.
> 
> Is it?  Are you suggesting we should ban upstream from using multiple
> distfiles with similar prefix?  What about other potential packages that
> may suffer from the same problem in the future?  Go packages have a good
> potential, given that majority of them starts with 'github.com'.

Please highlight which of my words imply in any way that I want to ban
something.  I simply said texlive's significant number of distfiles is a
problem.  That doesn't mean that I want to resolve the problem by banning
it, or future packages that employ that method.

My concern is that out of the tens of thousands of packages we have, we're
allowing ONE package to dictate how we shape a major piece of Gentoo
infrastructure, and I don't feel that the proposed solution seeks to address
it.  Rather, it seeks to band-aid it by wrapping the entire distro up like a
mummy.


>> Has anyone looked at how other distros deal with texlive?
> 
> Other distros don't mirror original distfiles.

Has thought be given to doing the same?  This is arguably a better approach
than mirroring original distfiles in devspace.  This would significantly
reduce the infrastructure burden on the project.


>>   Has anyone complained or filed a bug to texlive developers
>> upstream about their excessive amount of distfiles and the burden it places
>> on distro maintainers?
> 
> You believe it to be a problem.  Don't expect others to bother upstream
> with your preferences.

Hah.  So you consider texlive having 16k+ distfiles to be completely within
operating norms then?

I did a quick look, and it looks like the TeX project has a fairly
comprehensive mirroring system distributed around the world.  In fact, it
looks like they emulate Perl's CPAN system with "CTAN":

https://ctan.org/

I don't know the history of the texlive and other associated tex packages in
Gentoo, but my guess is instead of doing what our Perl packages do, someone
just decided to mirror the CTAN archive directly on the Gentoo distfiles
system.  It seems to me that what should actually happen is that we leverage
CTAN itself, much like CPAN, and use their mirroring system instead of
burdening our infrastructure as an unofficial CTAN archive.

I know we've got a ton of Perl packages for the core set of Perl modules,
but doesn't the CPAN eclass also have the capability to auto-generate an
ebuild package for virtually any Perl package distributed via CPAN?  Can
that logic be used with the CTAN system in its own eclass and then we remove
the 16k+ texlive modules off of our mirrors completely?  Or at the worst, we
might just have to generate ebuilds for texlive modules and treat them as
discrete, installed packages.

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
rsa6144/5C63F4E3F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

Reply via email to