On Mon, 6 Apr 2020 at 08:58, Timothée Floure <[email protected]> wrote:
> Hi, > > > How does the indexing works ? > > You point Yacy to a domain or list of URLs > (https://fnux.fedorapeople.org/pkgs/ in this case), and it takes care of > everything. There is also an advanced crawler panel in the UI allowing > you to filter content (e.g. HTML classes) from pages, which would be > useful if we do not want to index everything (e.g. dependencies). > > I am not familiar with the maths used by Yacy for indexing. > Me neither :D > > > And what would it take to add more info for each package ? > > I wrote a quick script (https://paste.gnugen.ch/raw/4JAC) fetching > package metadata from PDC+mdapi for testing, but it is ways too slow to > scale to the whole package set. > Cool, yeah the current indexing takes hours (I think around 4-5 hours) there are more than 80 000s packages and sub-packages. I think we can run this once a day so speed is not super super critical I would say. > > MDAPI will have to be replaced by local SQLite to increase performance. > I think we could generate most of the content from the repositories' > metadata (last N Fedora + EPEL) but I need to find where the SQL files > lives. A privileged endpoint to dist-get to fetch the package -> > maintainer mapping bypassing pagination would be convenient. > You can look at how mdapi grabs these sqllite files ( https://pagure.io/mdapi/blob/master/f/mdapi-get_repo_md). For the maintainer mapping you should be able to find that here --> https://src.fedoraproject.org/extras/ > We can use Yacy's JSON API to build a sexy fedora-branded search page > but I think it's a late-stage optimization. > +1, would be interesting to see with msuchy if that can be easily integrated with the work he was doing. > -- > Timothée > _______________________________________________ > infrastructure mailing list -- [email protected] > To unsubscribe send an email to > [email protected] > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/[email protected] >
_______________________________________________ infrastructure mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected]
