Re: [dev] Suckless web crawlers

2023-09-26 Thread Страхиња Радић
On 23/09/26 02:13PM, Sagar Acharya wrote:
> Which web crawlers and indexing tools does suckless suggest?

One can answer this and similar questions about particular niches of software 
by carefully reading and understanding what suckless is about:

https://suckless.org/philosophy/
(emphasis mine)
> We are the home of _quality software_ [...] with a _focus on simplicity, 
> clarity 
> and frugality_. Our philosophy is about _keeping things simple, minimal and 
> usable_. We believe this should become the mainstream philosophy in the IT 
> sector.


***


Immediately noticeable about xapian is that it is written in C++, which by 
itself is an overengineered language, far from being simple and frugal, which 
too often leads to overengineered software written in it.

Merely going by SLoC, just the latest xapian-core sub-project currently has 
around 92k lines of C++ source code. To get things into perspective, the latest 
shell mksh has around 29k lines of C code, and bash has around 118k lines of C 
code.


signature.asc
Description: PGP signature


Re: [dev] Suckless web crawlers

2023-09-26 Thread Sagar Acharya
It would not be as easy as that. One would have to rank the page, search for 
keywords for getting the page of relevant words are typed in search box. 


Thanking you
Sagar Acharya
https://humaaraartha.in/selfdost/selfdost.html



26 Sept 2023, 22:39 by d.toni...@gmail.com:

> I don't know exactly what you expect from your web crawler but let's say you 
> want to index every link there is on a page.
>
> you can just curl the page then grep for any link and for each link redo the 
> operation...
>
> This can easily be done with a small bash script (or c program if you want it 
> to be [insert here why you would want that])
>
> I can't personally recommend any crawler as I would clearly do it that way.
>
> Regards.
>
> Debucquoy Anthony (tonitch)
>
> On 9/26/23 14:13, Sagar Acharya wrote:
>
>> Which web crawlers and indexing tools does suckless suggest?
>>
>> The ones I searched for, the best I could find was xapian and it required 
>> targeted indexing I guess, i.e. for html, documents, etc.
>>
>> Which crawlers and indexers do you suggest?
>>
>>
>> Thanking you
>> Sagar Acharya
>> https://humaaraartha.in/selfdost/selfdost.html
>>



[dev] Suckless web crawlers

2023-09-26 Thread Sagar Acharya
Which web crawlers and indexing tools does suckless suggest?

The ones I searched for, the best I could find was xapian and it required 
targeted indexing I guess, i.e. for html, documents, etc.

Which crawlers and indexers do you suggest?


Thanking you
Sagar Acharya
https://humaaraartha.in/selfdost/selfdost.html