Hello Sebastien! At this moment I decide that I a0m going to:
1. "crawl" the repos using git clone from github/gitlab API. I could also use zoekt for clone. But I am still confused with the git clone syntax and the index syntax. 2. index it to solr for step 2 I fear I will have to write my own schem. Do you enjoy a PR to nutch to support git-clone protocol please? It may help me if I want to clone from a regex. Best regards. Le dim. 9 mars 2025 à 02:21, anon anon <anonimoussech...@gmail.com> a écrit : > great idea!!! > > Do you mean I could index with protocol file? I am confused if nutch is a > crawler or an indexer as solr is an indexer but indexes as well > > > Le dim. 12 janv. 2025 à 19:13, Sebastian Nagel <sna...@apache.org> a > écrit : > >> Hi, >> >> assumed you have your Git Web server running [1] - it just means crawling >> all >> URLs on this server. Cloning repositories cannot be done by Nutch because >> it's >> not done by sending a HTTP GET request. >> >> Alternatively, you might "crawl" the local filesystem containing the >> cloned >> repositories. Nutch has a protocol implementation "protocol-file" for >> this task. >> >> Best, >> Sebastian >> >> [1] https://git-scm.com/book/en/v2/Git-on-the-Server-GitWeb >> >> On 1/10/25 07:07, anon anon wrote: >> > Hello, >> > >> > I want to clone and index repo with a nutcjh config. >> > >> > Do you know how to have a such config please? >> > >> > Best regards! >> >>