Re: [DISCUSS] disable remote index extraction by default [NB18]

Matthias Bläsing Tue, 18 Apr 2023 11:37:39 -0700

Hi,

Am Dienstag, dem 18.04.2023 um 07:48 +0200 schrieb Jan Lahoda:
> I apologize for being contrarian, but since the index download
> started for me (again) while on a bus with very poor internet
> connection, I guess I should tell you my view.


no reason to apologize.

> Unless I am mistaken, the index gz has currently roughly 1.9GB, and
> it tooks several minutes to actually create the Lucene index from it,
> consuming some more space and CPU.
> 
> To be honest, it never seemed very polite to me to download and
> process so much without asking.
> 
> I guess alternatives that I would see would include (combination of
> options possible):
> - explicitly ask before downloading (possibly allowing the user to
> select auto-download)

Yes, if people get notified, that they'll get the full index locally,
then I'm okk with that. I see a problem if features silently give
outdated answers or don't work at all. Else we'll get "NetBeans
suggested version X, but Y is already on central, why is this not
current?".

> - have the features that use the index do some query on a server, if
> there isn't a downloaded index (or if it is stale/obsolete)

IMHO this highly depends on the speed of the API. If the latency is
high, the next bug will be "It takes ages until my POM tells me, that
it is outdated".

> - given that https://github.com/apache/netbeans/pull/4999 produces a
> smaller index, we could have a download location (server) at least
> for maven central that would serve this optimized index. If I
> understand it properly, the smallest index under that PR is 0.8GB,
> and if it would compress reasonably well, it might be (say) 0.5GB
> compressed - much better than 1.9GB, and no significant CPU usage
> after the index is downloaded. (Even if it was 0.8GB, it is still
> much better than 1.9GB+CPU churn.)

Truncating the index needs to be done carefully. NetBeans has a search
my SHA1 (or MD5?) feature. That will break, if you remove that data
from the index. A similar situation will arise, if arbitrary cut offs
are done based on time. Consider a libary, that does some interesting
algorithm, that just works the same even after years. If we cut the
index at 6 months for example, that artifact won't be found anymore.

> There was also an argument on conserving the ASF resources in another
> discussion recently. If I consider there would be (only) 10 000
> installations of NetBeans, with the default setting to download the
> index once a week, it is almost 20TB of data every week if I count
> correctly. +the CPU cycles to convert the index on user's machines.
> It seems there may be a way to conserve the ASF resources and provide
> better experience to the users at the same time.

The download is from sonatypes CDN. Given that they actively discourage
central mirrors, I have not to much concern here. It is also the the
resourced of the ASF.

Greetings

Matthias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@netbeans.apache.org
For additional commands, e-mail: dev-h...@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists

Re: [DISCUSS] disable remote index extraction by default [NB18]

Reply via email to