Author: Guillaume
Email: inscript...@atlza.com
Message:
Hi Again,

I'm having a problem with some Section lines in the indexer.conf wit 
mnogosearch 3.4.1

Here is an extract of my indexer.conf :

Section ResponseTime            0       32
# Standard sections: body, title
Section body                    1       1024
Section title                   2       256

# HTML meta tags, e.g. <META NAME="KEYWORDS" CONTENT="xxxx">
Section meta.keywords           3
Section meta.description        4       256

# Incoming link text
Section ilinktext               5       128

# Document's URL part
Section url.file                6       0
Section url.path                7       0
Section url.host                8       0
Section url.proto               9       0

# Useful meta information
Section Charset                 10      32
Section Content-Type            11      64
Section Content-Language        12      16

# Message/rfc822 headers
#Section msg.from               15
#Section msg.to                 16
#Section msg.subject            17

# A user defined section example.
# Extract text between <h1> and </h1> tags:
#Section h1                     20 128 "<h1>(.*)</h1>" $1
Section h1                      26      256     "<h1[^>]*>(.*)</h1>" $1
Section h2                      26      256 "<h2[^>]*>(.*)</h2>" $1
Section h3                      26      256 "<h3[^>]*>(.*)</h3>" $1
Section canonical               33      1024 '<link rel="canonical" 
+href="([^"]*)"' $1
Section ogdescription           33      300  '<meta property="og:description" 
+content="([^"]*")' $1
Section ogtitle                 34      128  '<meta property="og:title" 
+content="([^"]*")' $1

# Uncomment the following lines if you want index MP3 tags.
#Section MP3.Song               25
#Section MP3.Album              26
#Section MP3.Artist             27
#Section MP3.Year               28

# HTTP headers, e.g. "Server" HTTP header
#Section header.server          30
Section header                  30      128
Section header.server           30      128
Section header.Date             30      128
Section header.Last-Modified    30      128
Section header.Etag             30      128
Section header.X-Robots-Tag     30      128
# HTML tag attributes
Section attribute.alt           35      128
Section attribute.label         36      128
Section attribute.summary       37      128
Section attribute.title         38      128

----

And after crawl, the only info saved in the urlinfo table are : 
Canonical
Charset
Content-language
Content-type
h1
h2
h3
ogdescription
ogtitle
ResponseTime

As we can see various sections are missing, including some importants one as 
Title and meta.description which I've checked exist in my server.
This results are the same for various documents and various servers.

I've also tried to not set a length to title, body and meta.description as in 
the 3.4 documentation example, but is doesn't work better.

Did I miss something ?

Thanks for the help, mnogosearch is a great tool !



Reply: <http://www.mnogosearch.org/board/message.php?id=21746>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to