[General] Webboard: Indexer : How to get all H2 tags on the page

2016-05-04 Thread bar
Author: rafikCyc Email: Message: Hello, Section h2 23 256 "]*)>([^$3]+)()" $2 this work well, but the indexer only store the first he found in database and ignore all the others H2. Is there a way to get all tags like the function preg_match_all does in PHP ?

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar
Author: Alexander Barkov Email: b...@mnogosearch.org Message: > Hello, > > I've tried this : > > ./indexer -p 0 > > but it doesn't work :( > The indexer sleeps for at least one seconde after each URL. With -p0 it does not do any delays between URLs. I guess the bottleneck is in the connection,

[General] Webboard: Indexer : How to get all H2 tags on the page

2016-05-04 Thread bar
Author: Alexander Barkov Email: b...@mnogosearch.org Message: > Hello, > > Section h2 23 256 "]*)>([^$3]+)()" $2 > > this work well, but the indexer only store the first he found in > database and ignore all the others H2. > > Is there a way to get all tags like the

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar
Author: rafikCyc Email: Message: Thank you for the reply. Well, You're right... With -P0 it does not have a limit of 1s. But it remain very slow though. -- I just did a quick speed test on a small site (500 documents) Mnogosearch VS screaming frog. The results : Mnogosearch : 3.2 urls /

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar
Author: Alexander Barkov Email: b...@mnogosearch.org Message: > Here is the site : http://www.asbuers.com/ After crawling this site with mnoGoSearch, I did the following: # Extracted the list of all documents found (478 documents) mysql -uroot -N --database=tmp --execute="SELECT url FROM url"

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar
Author: rafikCyc Email: Message: Hello, I've tried this : ./indexer -p 0 but it doesn't work :( The indexer sleeps for at least one seconde after each URL. It seems impossible to index faster than 1s after each url. To index 300 000 document on my website for example, the crawl takes 2 full