Re: [General] Crawling order

2014-08-06 Thread d...@hodei.net

ok, thanks a lot

Le 05/08/2014 18:36, Alexander Barkov a écrit :

Hi,


On 08/05/2014 12:12 PM, d...@hodei.net wrote:

Hi

I have 1000 websites in my indexer.conf on the 'Server method' rubric

in what order the 'crawler' look over the list of website : random,
alphabetical or other


Crawler selects targets in a random order.

There are some related command line options:


  -e  Visit 'most expired' (oldest) documents first
  -o  Visit documents with less depth (hops value) first
  -r  Do not try to reduce remote servers load by 
randomising

  crawler queue order (faster, but less polite)






thanks for your help


_
my config :
* Debian 3.2.51-1 x86_64 GNU/Linux
* mnogosearch 3.3.15
* contents of indexer.conf :
 ..
 DBAddr mysql://root:password@localhost/mnogosearch/?dbmode=blob
 ..
_



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: Creating custom sections from server headers

2014-08-06 Thread bar
Author: Oliver
Email: 
Message:
Hello,

first, thank you for the great search engine you have created!

For indexing a custom site, I want to create a custom section which should 
contain the filename extracted from the Content-Disposition HTTP header. This 
header is sent for downloadable files and 
its value might look like this:

inline; filename=comments.doc

For this, I've added new sections in the indexer.conf file:

Section header.content-disposition30 128
Section content_filename  31   128 cdoff   ${header.content-disposition} 
^\w+; filename=(.+)$ $1

This indeed adds a header.content-disposition variable which I can use in the 
search.htm file, and which contains the entire Content-Disposition header value.
However, the content_filename section is not created correctly; it is always 
empty.

Through experimenting I found that ${header.content-disposition} is apparently 
not recognized as a variable in the Section command. Is there a way to access 
the Content-Disposition value 
anyway when defining a new section? Also, is there an overview of variables 
available in these Section commands?

As workaround I now use the EREG command in search.htm to extract the filename 
when the results are displayed. However, this is probably less efficient (it's 
done whenever the results are 
displayed, instead of only once during indexing). Also, it adds the entire 
Content-Disposition header to the index, so searching for inline or for 
filename finds all documents which have a Content-
Disposition header - not very desirable.

Can you give me some hints on the variables available in Section commands in 
indexer.conf?

Thanks,
Oliver 

Reply: http://www.mnogosearch.org/board/message.php?id=21653

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general