Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi Guillaume,

title, body and meta.description are not really needed to be in urlinfo for 
search purposes in 3.4.x. Search and search result presentation should work 
fine.

But you might of course need them for some other external purposes, e.g. site 
analysis. The intention in the latest changes in 3.4.x
was not to store sections in urlinfo by default, but they should be
stored if the "length" parameter is set to non-zero.
It seems something went wrong. I'll check it after the weekend
(currently out of my development box).


> Hi Again,
> 
> I'm having a problem with some Section lines in the indexer.conf wit 
> mnogosearch 3.4.1
> 
> Here is an extract of my indexer.conf :
> 
> Section ResponseTime            0       32
> # Standard sections: body, title
> Section body                    1       1024
> Section title                   2       256
> 
> # HTML meta tags, e.g. <META NAME="KEYWORDS" CONTENT="xxxx">
> Section meta.keywords           3
> Section meta.description        4       256
> 
> # Incoming link text
> Section ilinktext               5       128
> 
> # Document's URL part
> Section url.file                6       0
> Section url.path                7       0
> Section url.host                8       0
> Section url.proto               9       0
> 
> # Useful meta information
> Section Charset                 10      32
> Section Content-Type            11      64
> Section Content-Language        12      16
> 
> # Message/rfc822 headers
> #Section msg.from               15
> #Section msg.to                 16
> #Section msg.subject            17
> 
> # A user defined section example.
> # Extract text between <h1> and </h1> tags:
> #Section h1                     20 128 "<h1>(.*)</h1>" $1
> Section h1                      26      256     "<h1[^>]*>(.*)</h1>" $1
> Section h2                      26      256 "<h2[^>]*>(.*)</h2>" $1
> Section h3                      26      256 "<h3[^>]*>(.*)</h3>" $1
> Section canonical               33      1024 '<link rel="canonical" 
> +href="([^"]*)"' $1
> Section ogdescription           33      300  '<meta property="og:description" 
> +content="([^"]*")' $1
> Section ogtitle                 34      128  '<meta property="og:title" 
> +content="([^"]*")' $1
> 
> # Uncomment the following lines if you want index MP3 tags.
> #Section MP3.Song               25
> #Section MP3.Album              26
> #Section MP3.Artist             27
> #Section MP3.Year               28
> 
> # HTTP headers, e.g. "Server" HTTP header
> #Section header.server          30
> Section header                  30      128
> Section header.server           30      128
> Section header.Date             30      128
> Section header.Last-Modified    30      128
> Section header.Etag             30      128
> Section header.X-Robots-Tag     30      128
> # HTML tag attributes
> Section attribute.alt           35      128
> Section attribute.label         36      128
> Section attribute.summary       37      128
> Section attribute.title         38      128
> 
> ----
> 
> And after crawl, the only info saved in the urlinfo table are : 
> Canonical
> Charset
> Content-language
> Content-type
> h1
> h2
> h3
> ogdescription
> ogtitle
> ResponseTime
> 
> As we can see various sections are missing, including some importants one as 
> Title and meta.description which I've checked exist in my server.
> This results are the same for various documents and various servers.
> 
> I've also tried to not set a length to title, body and meta.description as in 
> the 3.4 documentation example, but is doesn't work better.
> 
> Did I miss something ?
> 
> Thanks for the help, mnogosearch is a great tool !
> 


Reply: <http://www.mnogosearch.org/board/message.php?id=21747>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to