On Mon, 6 Apr 2020 at 08:58, Timothée Floure <[email protected]>
wrote:

> Hi,
>
> > How does the indexing works ?
>
> You point Yacy to a domain or list of URLs
> (https://fnux.fedorapeople.org/pkgs/ in this case), and it takes care of
> everything. There is also an advanced crawler panel in the UI allowing
> you to filter content (e.g. HTML classes) from pages, which would be
> useful if we do not want to index everything (e.g. dependencies).
>
> I am not familiar with the maths used by Yacy for indexing.
>

Me neither :D

>
> > And what would it take to add more info for each package ?
>
> I wrote a quick script (https://paste.gnugen.ch/raw/4JAC) fetching
> package metadata from PDC+mdapi for testing, but it is ways too slow to
> scale to the whole package set.
>

Cool, yeah the current indexing takes hours (I think around 4-5 hours)
there are more than 80 000s packages and sub-packages. I think we can run
this once a day so speed is not super super critical I would say.

>
> MDAPI will have to be replaced by local SQLite to increase performance.
> I think we could generate most of the content from the repositories'
> metadata (last N Fedora + EPEL) but I need to find where the SQL files
> lives. A privileged endpoint to dist-get to fetch the package ->
> maintainer mapping bypassing pagination would be convenient.
>

You can look at how mdapi grabs these sqllite files (
https://pagure.io/mdapi/blob/master/f/mdapi-get_repo_md). For the
maintainer mapping you should be able to find that here -->
https://src.fedoraproject.org/extras/


> We can use Yacy's JSON API to build a sexy fedora-branded search page
> but I think it's a late-stage optimization.
>

+1, would be interesting to see with msuchy if that can be easily
integrated with the work he was doing.


> --
> Timothée
> _______________________________________________
> infrastructure mailing list -- [email protected]
> To unsubscribe send an email to
> [email protected]
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/[email protected]
>
_______________________________________________
infrastructure mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to