Re: clone repos with nutch

anon anon Mon, 10 Mar 2025 17:27:50 -0700

Hello Sebastien!

At this moment I decide that I a0m going to:

1. "crawl" the repos using git clone from github/gitlab API. I could also
use zoekt for clone. But I am still confused with the git clone syntax and
the index syntax.
2. index it to solr

for step 2 I fear I will have to write my own schem.

Do you enjoy a PR to nutch to support git-clone protocol please? It may
help me if I want to clone from a regex.

Best regards.

Le dim. 9 mars 2025 à 02:21, anon anon <[email protected]> a
écrit :

> great idea!!!
>
> Do you mean I could index with protocol file? I am confused if nutch is a
> crawler or an indexer as solr is an indexer but indexes as well
>
>
> Le dim. 12 janv. 2025 à 19:13, Sebastian Nagel <[email protected]> a
> écrit :
>
>> Hi,
>>
>> assumed you have your Git Web server running [1] - it just means crawling
>> all
>> URLs on this server. Cloning repositories cannot be done by Nutch because
>> it's
>> not done by sending a HTTP GET request.
>>
>> Alternatively, you might "crawl" the local filesystem containing the
>> cloned
>> repositories. Nutch has a protocol implementation "protocol-file" for
>> this task.
>>
>> Best,
>> Sebastian
>>
>> [1] https://git-scm.com/book/en/v2/Git-on-the-Server-GitWeb
>>
>> On 1/10/25 07:07, anon anon wrote:
>> > Hello,
>> >
>> > I want to clone and index repo with a nutcjh config.
>> >
>> > Do you know how to have a such config please?
>> >
>> > Best regards!
>>
>>

Re: clone repos with nutch

Reply via email to