from:"bar"

[General] Webboard: Indexer not working for name-based https site

2017-06-27 Thread bar

Author: Stan
Email: 
Message:
Hi,

I switched a development site to using https, using name-based host matching in 
apache (same IP, multiple sites) and the mnogosearch indexer no longer works:

[10072]{01} URL: https://test.somesite.com/
[10072]{01} ROBOTS: https://test.somesite.com/robots.txt
[10072]{01} No HTTP response status

If I specify http in indexer.conf, the indexing completes but adding a redirect 
to https in apache (which will be the case in production) breaks it.

The certificate is valid and signed by a CA (the site opens fine in a browser).

Here's the full output with verbose = 5:

https://pastebin.com/qQqWQjxN

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: index only new pages

2017-04-12 Thread bar

Author: Alexander Barkov
Email: 
Message:
> Unfortunately it did not work, but I found a working method.
> 
> I added 'Period 30y' before my 'Server' command in config file and 
> did
>  indexer --drop
>  indexer --create
>  indexer -a
> It ran forever. I killed it (ctrl-C) and it reported crawling over 
> 50 pages - there are about 16000 pages on the site.

It seems 30y makes some integer overflow.
Should work with "Period 1y".

> 
> I removed the 'Period' command and reindexed the site. I then 
> added a new directory with the newest pages and did:
> 
>  indexer -ai -u 'https://domain.com/msgs/v117n014/%.html'
>  indexer --index

The above command will insert 'https://domain.com/msgs/v117n014/%.html' into 
the database. This is probably not what you need.


It should be:

indexer -ai -u 'https://domain.com/msgs/v117n014/'
indexer --index


> 
> This processed only the new pages and correctly added them to the 
> index.
> 
> Thanks,
> Jeff


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: index only new pages

2017-04-12 Thread bar

Author: Jeff Dwork
Email: jeffdwor...@gmail.com
Message:
Unfortunately it did not work, but I found a working method.

I added 'Period 30y' before my 'Server' command in config file and 
did
 indexer --drop
 indexer --create
 indexer -a
It ran forever. I killed it (ctrl-C) and it reported crawling over 
50 pages - there are about 16000 pages on the site.

I removed the 'Period' command and reindexed the site. I then 
added a new directory with the newest pages and did:

 indexer -ai -u 'https://domain.com/msgs/v117n014/%.html'
 indexer --index

This processed only the new pages and correctly added them to the 
index.

Thanks,
Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: index only new pages

2017-04-11 Thread bar

Author: Alexander Barkov
Email: 
Message:
> I'm indexing a mailing list archive. Pages never change. Every 
> week a few pages are added in a new directory. The archive is on 
> the same machine as the index, so my server directive is
> 
> Server https://domain.com/msgs/ file:///var/www/domain/msgs/
> 
> I ran a full index (indexer --drop; indexer --create; indexer -a) 
> after creating the archive. The next week I add new messages in a 
> new directory (for example: /var/www/domain/msgs/v117n013/). I 
> cannot get the new pages indexed. I tried 'indexer' with no 
> options and several variations on
>  indexer -a -u '%/v117n013/%'
> all report 0 documents indexed.
> So I have to run another full index.
> 
> How can I get only the new pages indexed?

You need to re-crawl the index page:

indexer -am -u https://domain.com/msgs/

The you can run like this:

indexer -u '%/v117n013/%'


Btw, don't forget to set Period to some huge value.


> 
> Thanks,
> Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: index only new pages

2017-04-10 Thread bar

Author: Jeff Dwork
Email: jeffdwor...@gmail.com
Message:
I'm indexing a mailing list archive. Pages never change. Every 
week a few pages are added in a new directory. The archive is on 
the same machine as the index, so my server directive is

Server https://domain.com/msgs/ file:///var/www/domain/msgs/

I ran a full index (indexer --drop; indexer --create; indexer -a) 
after creating the archive. The next week I add new messages in a 
new directory (for example: /var/www/domain/msgs/v117n013/). I 
cannot get the new pages indexed. I tried 'indexer' with no 
options and several variations on
 indexer -a -u '%/v117n013/%'
all report 0 documents indexed.
So I have to run another full index.

How can I get only the new pages indexed?

Thanks,
Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-02-06 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hello  Julien,

> Hello Alexander,
> 
> Thanks for the answer.
> However, the problem occurs on the indexing phase : the crawler tries to 
> index 
> http://www.example.com/www.example.com/page-b.html (which does not exist) 
> instead of http://www.example.com/page-b.html
> 
> Can I prevent those 404 errors ?
> 
> Thanks !

I have added support for protocol-relative URLs into the next release 3.4.2. I 
hope to make it available for download this week.

Note, the database structure is slightly different in 3.4.2 vs 3.4.1,
so full re-crawling will be needed. Hope it won't be a serious problem.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-02-05 Thread bar

Author: Alexander Barkov
Email: 
Message:

> 
> Hello Alexander,
> 
> I currently use 3.4.1.
> 
> Is there a new release I am not aware of ?
> 
> Thank you for your quick answers !

No, 3.4.1 is the latest.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: commands in search.htm

2017-02-05 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hi Jeff,

> Version 3.4.1, using search.cgi, Debian 8.6
> Where do I place commands in search.htm?
> Specifically, I want to use 'Section' command. If I put
>  Section title 1 128
> inside , I get 
>  13: ERROR: Unknown identifier 'Section'
> If it is outside, I get errors from apache2.
> 
> How do I make this work?
> 
> Thanks,
> Jeff

Please use env.addline(). Find examples here:

http://www.mnogosearch.org/doc34/msearch-templates.html#template-class-env



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: commands in search.htm

2017-02-01 Thread bar

Author: Jeff Dwork
Email: jeffdwor...@gmail.com
Message:
Version 3.4.1, using search.cgi, Debian 8.6
Where do I place commands in search.htm?
Specifically, I want to use 'Section' command. If I put
 Section title 1 128
inside , I get 
 13: ERROR: Unknown identifier 'Section'
If it is outside, I get errors from apache2.

How do I make this work?

Thanks,
Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-02-01 Thread bar

Author: Julien D.
Email: jul...@clustaar.com
Message:
> Hello  Julien,
> 
> > > Hello,
> > > 
> > > > Hello,
> > > > 
> > > > I couldn't find any information on this subject.
> > > > As people start using HTTPS, I get more and more problems when 
crawling 
> > with 
> > > > links that don't use a specific protocol.
> > > > 
> > > > Let's take this example of a link from http://www.example.com/page-
a.html :
> > > > text
> > > > 
> > > > Will be seen as : http://www.example.com/www.example.com/page-
b.html
> > > > And of course will cause a 404 error.
> > > > 
> > > > Any idea on how to get the right links ?
> > > > 
> > > > Thanks.
> > > 
> > > The crawler stores full URLs in the database.
> > > But you can remove the protocol at search time,
> > > using the search template language functionality.
> > > 
> > > In 3.4.x use regex_substr:
> > > http://www.mnogosearch.org/doc34/msearch-templates.html#template-
> > functions
> > > 
> > > In 3.3.x use the EREG template operator:
> > > http://www.mnogosearch.org/doc33/msearch-templates-
> > oper.html#templates-oper-misc
> > > 
> > 
> > Hello Alexander,
> > 
> > Thanks for the answer.
> > However, the problem occurs on the indexing phase : the crawler tries to 
index 
> > http://www.example.com/www.example.com/page-b.html (which does not 
exist) 
> > instead of http://www.example.com/page-b.html
> > 
> > Can I prevent those 404 errors ?
> > 
> > Thanks !
> 
> Oops. This is not supported yet, indeed. I thought it was.
> It should be easy to add this. Which version are you using?
> 

Hello Alexander,

I currently use 3.4.1.

Is there a new release I am not aware of ?

Thank you for your quick answers !

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-01-31 Thread bar

Author: Julien D.
Email: jul...@clustaar.com
Message:
> Hello,
> 
> > Hello,
> > 
> > I couldn't find any information on this subject.
> > As people start using HTTPS, I get more and more problems when crawling 
with 
> > links that don't use a specific protocol.
> > 
> > Let's take this example of a link from http://www.example.com/page-a.html :
> > text
> > 
> > Will be seen as : http://www.example.com/www.example.com/page-b.html
> > And of course will cause a 404 error.
> > 
> > Any idea on how to get the right links ?
> > 
> > Thanks.
> 
> The crawler stores full URLs in the database.
> But you can remove the protocol at search time,
> using the search template language functionality.
> 
> In 3.4.x use regex_substr:
> http://www.mnogosearch.org/doc34/msearch-templates.html#template-
functions
> 
> In 3.3.x use the EREG template operator:
> http://www.mnogosearch.org/doc33/msearch-templates-
oper.html#templates-oper-misc
> 

Hello Alexander,

Thanks for the answer.
However, the problem occurs on the indexing phase : the crawler tries to index 
http://www.example.com/www.example.com/page-b.html (which does not exist) 
instead of http://www.example.com/page-b.html

Can I prevent those 404 errors ?

Thanks !

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-01-25 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hello,

> Hello,
> 
> I couldn't find any information on this subject.
> As people start using HTTPS, I get more and more problems when crawling with 
> links that don't use a specific protocol.
> 
> Let's take this example of a link from http://www.example.com/page-a.html :
> text
> 
> Will be seen as : http://www.example.com/www.example.com/page-b.html
> And of course will cause a 404 error.
> 
> Any idea on how to get the right links ?
> 
> Thanks.

The crawler stores full URLs in the database.
But you can remove the protocol at search time,
using the search template language functionality.

In 3.4.x use regex_substr:
http://www.mnogosearch.org/doc34/msearch-templates.html#template-functions

In 3.3.x use the EREG template operator:
http://www.mnogosearch.org/doc33/msearch-templates-oper.html#templates-oper-misc


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links without specific protocol

2017-01-23 Thread bar

Author: Julien D.
Email: jul...@clustaar.com
Message:
Hello,

I couldn't find any information on this subject.
As people start using HTTPS, I get more and more problems when crawling with 
links that don't use a specific protocol.

Let's take this example of a link from http://www.example.com/page-a.html :
text

Will be seen as : http://www.example.com/www.example.com/page-b.html
And of course will cause a 404 error.

Any idea on how to get the right links ?

Thanks.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Make error - proto.c

2016-12-02 Thread bar

Author: Alexander Barkov
Email: 
Message:
> The version of Make is 3.82-21. And i try to build mnogosearch 3.4.1.
> 
> The configure command used is : ./configure  --disable-mp3 --disable-
> news --without-debug --with-pgsql=no --with-freetds=no --with-
> oracle8=no --with-oracle8i=no --with-iodbc=no --with-unixODBC=no --
> with-db2=no --with-solid=no --with-openlink=no --with-easysoft=no --
> with-sapdb=no --with-ibase=no --with-ctlib=no --with-zlib --with-
> mysql --disable-syslog

It seems that UDM_REQUEST_INFO is defined in a wrong place.

Will fix in the next release.

Please don't use the --disable-news parameter in the meanwhile.

Thanks for reporting!



> 
> Thanks for your help.
> 
> Tom
>  

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Make error - proto.c

2016-12-01 Thread bar

Author: tom
Email: 
Message:
The version of Make is 3.82-21. And i try to build mnogosearch 3.4.1.

The configure command used is : ./configure  --disable-mp3 --disable-
news --without-debug --with-pgsql=no --with-freetds=no --with-
oracle8=no --with-oracle8i=no --with-iodbc=no --with-unixODBC=no --
with-db2=no --with-solid=no --with-openlink=no --with-easysoft=no --
with-sapdb=no --with-ibase=no --with-ctlib=no --with-zlib --with-
mysql --disable-syslog

Thanks for your help.

Tom
 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Make error - proto.c

2016-11-30 Thread bar

Author: tom
Email: 
Message:
Hello,

I have this error when I use the make command :

proto.c: In function 'UdmFILEGet':
proto.c:1502:3: error: unknown type name 'UDM_REQUEST_INFO'
   UDM_REQUEST_INFO Request;
   ^
proto.c:1693:16: error: request for member 'if_modified_since' in something not 
a structure or union
 if (Request.if_modified_since >= sb.st_mtime)
^
make[2]: *** [proto.lo] Erreur 1

Do you know how can i solve this ?

Many thanks for your help,

Regards,

Tom



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Crash of indexation process

2016-10-23 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hi,

> Hi,
> This new topic is just for your information, Alexander.
> As you said that it is now possible to set the number of threads when 
> indexing data, I did a few tests.
> 
> indexer -N10 --index --> works
> With 20, 30,... 64 threads --> works
> 
> With 65 threads or more --> segmentation fault.
> I did try on two machines and got the same behaviour,  using 3.4.1
> 
> Regards,
> Fabien.

Thanks for reporting the problem. It's now fixed.
Now an attempt to use a -N value >64 makes indexer fallback to the single 
thread mode instead of crashing.





Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: Alexander Barkov
Email: 
Message:
> Thanks a lot for all your good advices.
> When do you plan the delivering of the next release ?
> 

I hope it should be available within two weeks.

> 
> > > > 1. indexer can run parallel threads for crawling, and starting from 
> > > > 3.4.0 for indexing:
> > > > 
> > > > # Run 10 crawling threads
> > > > indexer -N10
> > > > 
> > > > # Run 10 indexing threads
> > > > indexer -N10 --index
> > > > 
> > > 
> > > A related thing:
> > > 
> > > 3.4.1 will use a modified database structure, better optimized for faster 
> > > crawling.
> > > 
> > 
> > Oops, sorry. It will be 3.4.2 actually.
> > 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: fabien
Email: fabien.lahau...@gmail.com
Message:
Thanks a lot for all your good advices.
When do you plan the delivering of the next release ?


> > > 1. indexer can run parallel threads for crawling, and starting from 3.4.0 
> > > for indexing:
> > > 
> > > # Run 10 crawling threads
> > > indexer -N10
> > > 
> > > # Run 10 indexing threads
> > > indexer -N10 --index
> > > 
> > 
> > A related thing:
> > 
> > 3.4.1 will use a modified database structure, better optimized for faster 
> > crawling.
> > 
> 
> Oops, sorry. It will be 3.4.2 actually.
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: Alexander Barkov
Email: 
Message:
> > 
> > 1. indexer can run parallel threads for crawling, and starting from 3.4.0 
> > for indexing:
> > 
> > # Run 10 crawling threads
> > indexer -N10
> > 
> > # Run 10 indexing threads
> > indexer -N10 --index
> > 
> 
> A related thing:
> 
> 3.4.1 will use a modified database structure, better optimized for faster 
> crawling.
> 

Oops, sorry. It will be 3.4.2 actually.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: Alexander Barkov
Email: 
Message:
> 
> 1. indexer can run parallel threads for crawling, and starting from 3.4.0 for 
> indexing:
> 
> # Run 10 crawling threads
> indexer -N10
> 
> # Run 10 indexing threads
> indexer -N10 --index
> 

A related thing:

3.4.1 will use a modified database structure, better optimized for faster 
crawling.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hi Fabien,

> Hi all,
> 
> Is it possible to parallelize the indexing work on multiple machines ?
> I mean for example one global data server with the mnogosearch database, and 
> a few server instances all running the indexer process, pointing to the same 
> db server.
> 
> The idea behind that question would be to create server instances working in 
> parallel and therefore speed up the whole indexing work, and scale up with 
> more servers when needed.
> 
> Fabien.

mnoGoSearch supports multiple levels of parallelism.

1. indexer can run parallel threads for crawling, and starting from 3.4.0 for 
indexing:

# Run 10 crawling threads
indexer -N10

# Run 10 indexing threads
indexer -N10 --index

2. It's possible to run multiple crawling processes on the same machine. Just 
start "indexer" multiple times.
This is very similar to "indexer -N10", but in case if
one process crashes for some reason (e.g. a bug), the other
parallel processes will safely continue to crawl.

Note, this works only for crawling! It's not possible to run
multiple indexing processes ("indexer --index") on the same
database at the same time.

3. For crawling purposes, it's possible to use #1 and #2 at the same time. Just 
start "indexer -Nxxx" multiple times.
For example, if you start "indexer -N10" ten times,
you'll effectively get 100 crawling threads.

4. It is possible to run indexer in crawling mode on
multiple machines at the same time. This is very similar to N2,
but you just start indexer on different machines.

I think this is exactly what you're asking for.

To start using this, just copy indexer.conf to multiple machines
and make sure to fix DBAddr to point to the same database machine
(e.g. change localhost to the actual IP address of the database machine). No 
any other actions is needed.

Note, you can run multiple crawling processes on multiple database,
and every process can use multiple threads.

For example, you can start: "indexer -N10" ten times on ten machines
and you'll effectively get 1000 crawling theads.

Note, you can use combinations of the above ways.
For example:
- Machine A can run an individual single thread crawler
- Machine B can run multiple single thread crawlers
- Machine C can run an individual multi-thread crawler
- Machine D can run multiple multi-thread crawlers

At the same time with the same database!
Just make sure to have a very fast database server.
Consider using faster (e.g. SSD and/or RAID) disks and more
RAM to help the database server cache as many index pages
as possible.

At some point (when running a few dozens or hundreds threads in total)
you'll reach a heavy thread contension, so the crawler threads
will be waiting for the database to serve them. But there is still
a workaround. See #5.


5. And finally, it's possible to distribute data between multiple
databases, for even more parallelism. This mode needs some extra
configuration. Please see here for details:
http://www.mnogosearch.org/doc34/msearch-cluster.html

Note, the cluster nodes can reside:
- on the same phisical machines with multiple database servers each using its 
own phisical hard disk
- or on different phisical machines.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Horizontal scalability

2016-10-20 Thread bar

Author: fabien
Email: fabien.lahau...@gmail.com
Message:
Hi all,

Is it possible to parallelize the indexing work on multiple machines ?
I mean for example one global data server with the mnogosearch database, and a 
few server instances all running the indexer process, pointing to the same db 
server.

The idea behind that question would be to create server instances working in 
parallel and therefore speed up the whole indexing work, and scale up with more 
servers when needed.

Fabien.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: exclude mime types

2016-10-13 Thread bar

Author: fabien
Email: fabien.lahau...@gmail.com
Message:
Hi,

I tried today the disallow statements, and it works like a charm ! :)
I can now exclude typical useless urls before they get downloaded by the 
indexer.

Thanks for your help and for your work !


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: Alexander Barkov
Email: 
Message:
> And to be more precise, i finally want to index only html pages and not all 
> other types of data (css/js/pictures/pdf/rss/...) .
> 

Something like this should do the trick:

NoIndexIf NoMatch Content-Type text/html*


Additionally, try to use the Disallow command to reduce the number of URLs that 
indexer has actually to download.
See here for details:
http://www.mnogosearch.org/board/message.php?id=21793


> Fabien.
> 



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: Alexander Barkov
Email: 
Message:
> > Thanks for your quick answer.
> > 
> > I tried to add the NoIndexIf but i cannot get it to work.
> > 
> > I used the indexer.conf default file, and added the two following lines at 
> > the end of that file : 
> > Server http://www.wearethelous.com/feed/
> > NoIndexIf Content-Type application/rss+xml
> 
> I tried the same thing, and it seems to work fine.
> This page is not returned in search results.
> 
> If I remove the NoIndexIf command, this page IS returned by search results.
> 
> 
> Note, indexer shows the URL in its log, because it still must
> download this URL to know its content type.
> But the fact that you can see the "SectionFilter:..." line in the log
> tells that indexer marks it as "not for indexing" and thus stores no data 
> into the underlying tables cachedcopy and bdicti, so "indexer --index" later 
> does see it when creating the search index.
> 


Note, if you know that documents under certain location return 
application/rss+xml or some other not desired content type,
then consider using Disallow instead. In this case indexer will
not even download these documents.

NoIndexIf is rather for the cases when it's not possible to describe "bad" 
documents by their URL pattern.





Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: Alexander Barkov
Email: 
Message:
> Thanks for your quick answer.
> 
> I tried to add the NoIndexIf but i cannot get it to work.
> 
> I used the indexer.conf default file, and added the two following lines at 
> the end of that file : 
> Server http://www.wearethelous.com/feed/
> NoIndexIf Content-Type application/rss+xml

I tried the same thing, and it seems to work fine.
This page is not returned in search results.

If I remove the NoIndexIf command, this page IS returned by search results.


Note, indexer shows the URL in its log, because it still must
download this URL to know its content type.
But the fact that you can see the "SectionFilter:..." line in the log
tells that indexer marks it as "not for indexing" and thus stores no data into 
the underlying tables cachedcopy and bdicti, so "indexer --index" later does 
see it when creating the search index.

> 
> I got the following log : 
> 
> [71598]{--} Clearing
> [71598]{--} Clearing done   0.01
> [71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with 
> '/etc/mnogosearch/indexer.conf'
> [71600]{01} URL: http://www.wearethelous.com/feed/
> [71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/'
> [71600]{01} Allow by default
> [71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Length: 67
> [71600]{01} Response.Content-Type: text/plain
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT
> [71600]{01} Response.Link: ; 
> rel="https://api.w.org/;
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 475
> [71600]{01} Response.ResponseTime: 2261
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Status: 200
> [71600]{01} Response.URL: http://www.wearethelous.com/robots.txt
> [71600]{01} Response.URL_ID: 1928115922
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.body: 
> [71600]{01} Response.Charset: 
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Language: 
> [71600]{01} Response.Content-Length: 2337
> [71600]{01} Response.Content-Type: application/rss+xml
> [71600]{01} Response.crc32: 0
> [71600]{01} Response.crc32old: 0
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT
> [71600]{01} Response.ETag: "7059155a990290887650add31475f88e"
> [71600]{01} Response.Hops: 0
> [71600]{01} Response.ID: 5
> [71600]{01} Response.ilinktext: 
> [71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT
> [71600]{01} Response.Link: ; 
> rel="https://api.w.org/;
> [71600]{01} Response.MaxDocPerSite: 0
> [71600]{01} Response.MaxHops: 256
> [71600]{01} Response.meta.description: 
> [71600]{01} Response.meta.keywords: 
> [71600]{01} Response.msg.from: 
> [71600]{01} Response.msg.subject: 
> [71600]{01} Response.msg.to: 
> [71600]{01} Response.PrevStatus: 0
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 2842
> [71600]{01} Response.ResponseTime: 1455
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Server_id: -2050898686
> [71600]{01} Response.Status: 200
> [71600]{01} Response.title: 
> [71600]{01} Response.URL: http://www.wearethelous.com/feed/
> [71600]{01} Response.url.file: 
> [71600]{01} Response.url.host: 
> [71600]{01} Response.url.path: 
> [71600]{01} Response.url.proto: 
> [71600]{01} Response.URL_ID: -2050898686
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Status: 200 OK
> [71600]{01} Guesser: Lang: , Charset: utf-8
> [71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 
> 'application/rss+xml'
> [71600]{01} Flushing word cache
> [71600]{01} Flushing word cache done0.00
> [71600]{01} Done (4 seconds, 1 documents, 2842 bytes,  0.69 Kbytes/sec.)
> 
> I see that the section filter talks about the NoIndexIf filter that i added, 
> but the url is still indexed.
> So what can be wrong ?
> 
> Thanks in advance for your help.
> Fabien.
> 
> 
> > Hi,
> > 
> > > Hi all,
> > > 
> > > Is it possible to exclude certain mime types

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: fabien
Email: fabien.lahau...@gmail.com
Message:
And to be more precise, i finally want to index only html pages and not all 
other types of data (css/js/pictures/pdf/rss/...) .

Fabien.

> Thanks for your quick answer.
> 
> I tried to add the NoIndexIf but i cannot get it to work.
> 
> I used the indexer.conf default file, and added the two following lines at 
> the end of that file : 
> Server http://www.wearethelous.com/feed/
> NoIndexIf Content-Type application/rss+xml
> 
> I got the following log : 
> 
> [71598]{--} Clearing
> [71598]{--} Clearing done   0.01
> [71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with 
> '/etc/mnogosearch/indexer.conf'
> [71600]{01} URL: http://www.wearethelous.com/feed/
> [71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/'
> [71600]{01} Allow by default
> [71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Length: 67
> [71600]{01} Response.Content-Type: text/plain
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT
> [71600]{01} Response.Link: ; 
> rel="https://api.w.org/;
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 475
> [71600]{01} Response.ResponseTime: 2261
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Status: 200
> [71600]{01} Response.URL: http://www.wearethelous.com/robots.txt
> [71600]{01} Response.URL_ID: 1928115922
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.body: 
> [71600]{01} Response.Charset: 
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Language: 
> [71600]{01} Response.Content-Length: 2337
> [71600]{01} Response.Content-Type: application/rss+xml
> [71600]{01} Response.crc32: 0
> [71600]{01} Response.crc32old: 0
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT
> [71600]{01} Response.ETag: "7059155a990290887650add31475f88e"
> [71600]{01} Response.Hops: 0
> [71600]{01} Response.ID: 5
> [71600]{01} Response.ilinktext: 
> [71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT
> [71600]{01} Response.Link: ; 
> rel="https://api.w.org/;
> [71600]{01} Response.MaxDocPerSite: 0
> [71600]{01} Response.MaxHops: 256
> [71600]{01} Response.meta.description: 
> [71600]{01} Response.meta.keywords: 
> [71600]{01} Response.msg.from: 
> [71600]{01} Response.msg.subject: 
> [71600]{01} Response.msg.to: 
> [71600]{01} Response.PrevStatus: 0
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 2842
> [71600]{01} Response.ResponseTime: 1455
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Server_id: -2050898686
> [71600]{01} Response.Status: 200
> [71600]{01} Response.title: 
> [71600]{01} Response.URL: http://www.wearethelous.com/feed/
> [71600]{01} Response.url.file: 
> [71600]{01} Response.url.host: 
> [71600]{01} Response.url.path: 
> [71600]{01} Response.url.proto: 
> [71600]{01} Response.URL_ID: -2050898686
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Status: 200 OK
> [71600]{01} Guesser: Lang: , Charset: utf-8
> [71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 
> 'application/rss+xml'
> [71600]{01} Flushing word cache
> [71600]{01} Flushing word cache done0.00
> [71600]{01} Done (4 seconds, 1 documents, 2842 bytes,  0.69 Kbytes/sec.)
> 
> I see that the section filter talks about the NoIndexIf filter that i added, 
> but the url is still indexed.
> So what can be wrong ?
> 
> Thanks in advance for your help.
> Fabien.
> 
> 
> > Hi,
> > 
> > > Hi all,
> > > 
> > > Is it possible to exclude certain mime types such as rss feeds ?
> > > 
> > 
> > This can be done using the NoIndexIf command:
> > 
> > http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html
> > 
> > Put this command into indexer.conf to disallow a certain Content-Type:
> > 
> > NoIndexIf Content-Type application/rss+xml
> > 
> > 
> > Another option is to use NoIndexIf in a combination with a user defined 
> > section, to check

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hi,

> Hi all,
> 
> Is it possible to exclude certain mime types such as rss feeds ?
> 

This can be done using the NoIndexIf command:

http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html

Put this command into indexer.conf to disallow a certain Content-Type:

NoIndexIf Content-Type application/rss+xml


Another option is to use NoIndexIf in a combination with a user defined 
section, to check raw content fragments:

http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined

The idea is to define a user section using a regex pattern to catch some known 
RSS text fragments, and then use NoIndexIf with this section.


> Thanks in advance,
> Fabien.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: exclude mime types

2016-10-12 Thread bar

Author: fabien
Email: fabien.lahau...@gmail.com
Message:
Hi all,

Is it possible to exclude certain mime types such as rss feeds ?

Thanks in advance,
Fabien.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: MySQL driver: #1054

2016-09-20 Thread bar

Author: Alexander Barkov
Email: 
Message:
  Hello,

The database structure in 3.4.x is not compatible with 3.3.x.
The easiest way is just to re-crawl all documents.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: MySQL driver: #1054

2016-09-15 Thread bar

Author: Martin
Email: 
Message:
I recreated the tables and it works but I have to reindex all my 
URLs, right?

the URLs are stored binary so how could i get all my server urls and 
restart indexing?

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: MySQL driver: #1054

2016-09-13 Thread bar

Author: Martin
Email: 
Message:
I updated to 3.4.1 and get 
DB: MySQL driver: #1054: Unknown column 'coords' in 'field list'

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links table and seed value

2016-07-07 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,

> Hello,
> 
> I couldn't find any information on how the "seed" value in the links table is 
> calculated.

Seed is calculated based on the normalized URL, as follows:

indexer.c:udmhash32_t seed= UdmStrHash32(Href->url) & 0xFF;
sql.c:url_seed= UdmStrHash32(H->url) & 0xFF;
sql.c:url_seed= UdmStrHash32(H->url) & 0xFF;
sql.c:  url_seed = UdmStrHash32(url) & 0xFF;



> How does it work ?
> Can we use our own rules ?

You'll need to replace all above lines to your own function.


> 
> Thanks !



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Links table and seed value

2016-07-06 Thread bar

Author: Julien D.
Email: jul...@1-clic.info
Message:
Hello,

I couldn't find any information on how the "seed" value in the links table is 
calculated.
How does it work ?
Can we use our own rules ?

Thanks !

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-02 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
Thank you very much, it works!

Words in Cyrillic are now correctly searched in the database. 
The first lines of each found results is also now in the correct encoding. 
That's fine!

навигатор : 610 
Results 1-10 of 89 ( 0.009 seconds) 
1   Главная   [ 11.193% Popularity: 0.89705 ] 


I insert in indexer.conf:
DBAddr 
mysql://mnogosearch_new:@localhost/mnogosearch_new/?SetNames=utf8?dbmode=blob=/tmp/mysql.sock=yes

And I set in search.htm:
  string BrowserCharset= "windows-1251";
  string LocalCharset= "UTF-8";


My MySQL client used default settins (really we still don't use UTF-8 
databases), I changed it to UTF-8 just now.
| 30летних   | 3330D0BBD0B5D182D0BDD0B8D185 |
| 3летний| 33D0BBD0B5D182D0BDD0B8D0B9   |


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-02 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hmm. It seems your MySQL is client is not configured well.
It's using latin1 as a connection character set, while the
display is onviously utf8. So it prints garbage instead of
Cyrillic letters.

You can check this using "show variables like 'character_set%';".
It seems character_set_connection is latin1.

In order to see Cyrillic letters, you can try:

- mysql --default-character-set=utf8
- or put default-character-set=utf8 into my.cnf
- or run "SET NAMES utf8" immediately after connecting

Note, this does not affect the way how indexer works.
It's only for the "mysql" client.


> The results are the same for both bases.

They are not. Hex codes are different.
The old database contains Cyrillic codes,
the new database contains something different for the same
strings:


This is wrong:

| 30Ð»ÐµÑ‚Ð½Ð¸Ñ…   | 
3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |

This is correct:
| 30Ð»ÐµÑ‚Ð½Ð¸Ñ…   | 3330D0BBD0B5D182D0BDD0B8D185 |



Try adding "SetNames=utf8" in the DBAddr string in indexe.conf in the 
new database, like this:

DBAddr mysql://root@localhost/test/?SetNames=utf8

then clean the database and crawl and index again.


> 
> mysql> use mnogosearch_new;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE 
> '^[a-z0-9?#_]*$' LIMIT 30;
> +--+--+
> | word | hex(word)
> |
> +--+--+
> | 000Ð²| 303030C390C2B2   
> |
> | 099Ð²| 303939C390C2B2   
> |
> | 107Ñ€Ñ•  | 313037C391E282ACC391E280A2   
> |
> | 10Ð¼Ð»Ð½ | 3130C390C2BCC390C2BBC390C2BD 
> |
> | 11Ð² | 3131C390C2B2 
> |
> | 18Ð² | 3138C390C2B2 
> |
> | 1970Ñ…   | 31393730C391E280A6   
> |
> | 1980Ð³   | 31393830C390C2B3 
> |
> | 1Ð²  | 31C390C2B2   
> |
> | 1Ñ€  | 31C391E282AC 
> |
> | 2001Ð³   | 32303031C390C2B3 
> |
> | 2002Ñ€Ñ– | 32303032C391E282ACC391E28093 
> |
> | 2004Ð³   | 32303034C390C2B3 
> |
> | 2006Ð³   | 32303036C390C2B3 
> |
> | 2008Ð³   | 32303038C390C2B3 
> |
> | 2009Ð³   | 32303039C390C2B3 
> |
> | 2009Ñ€Ñ– | 32303039C391E282ACC391E28093 
> |
> | 2011Ð³   | 32303131C390C2B3 
> |
> | 2012Ñ€Ñ– | 32303132C391E282ACC391E28093 
> |
> | 20Ñ | 3230C391C281  
>|
> | 30Ð»ÐµÑ‚Ð½Ð¸Ñ…   | 
> 3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |
> | 3Ð»ÐµÑ‚Ð½Ð¸Ð¹| 
> 33C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C390C2B9 |
> | 40Ð² | 3430C390C2B2 
> |
> | 41Ð² | 3431C390C2B2 
> |
> | 48Ð² | 3438C390C2B2 
> |
> | 599Ð²| 353939C390C2B2   
> |
> | 59Ð² | 3539C390C2B2 
> |
> | 600Ð²| 363030C390C2B2   
> |
> | 60Ð² | 3630C390C2B2 
> |
> | 90Ñ… | 3930C391E280A6   
> |
> +--+--+
> 30 rows in set (0,00 sec)
> 
> 
> 
> mysql> use mnogosearch;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
The results are the same for both bases.

mysql> use mnogosearch_new;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' 
LIMIT 30;
+--+--+
| word | hex(word)  
  |
+--+--+
| 000Ð²| 303030C390C2B2 
  |
| 099Ð²| 303939C390C2B2 
  |
| 107Ñ€Ñ•  | 313037C391E282ACC391E280A2 
  |
| 10Ð¼Ð»Ð½ | 3130C390C2BCC390C2BBC390C2BD   
  |
| 11Ð² | 3131C390C2B2   
  |
| 18Ð² | 3138C390C2B2   
  |
| 1970Ñ…   | 31393730C391E280A6 
  |
| 1980Ð³   | 31393830C390C2B3   
  |
| 1Ð²  | 31C390C2B2 
  |
| 1Ñ€  | 31C391E282AC   
  |
| 2001Ð³   | 32303031C390C2B3   
  |
| 2002Ñ€Ñ– | 32303032C391E282ACC391E28093   
  |
| 2004Ð³   | 32303034C390C2B3   
  |
| 2006Ð³   | 32303036C390C2B3   
  |
| 2008Ð³   | 32303038C390C2B3   
  |
| 2009Ð³   | 32303039C390C2B3   
  |
| 2009Ñ€Ñ– | 32303039C391E282ACC391E28093   
  |
| 2011Ð³   | 32303131C390C2B3   
  |
| 2012Ñ€Ñ– | 32303132C391E282ACC391E28093   
  |
| 20Ñ | 3230C391C281
 |
| 30Ð»ÐµÑ‚Ð½Ð¸Ñ…   | 
3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |
| 3Ð»ÐµÑ‚Ð½Ð¸Ð¹| 
33C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C390C2B9 |
| 40Ð² | 3430C390C2B2   
  |
| 41Ð² | 3431C390C2B2   
  |
| 48Ð² | 3438C390C2B2   
  |
| 599Ð²| 353939C390C2B2 
  |
| 59Ð² | 3539C390C2B2   
  |
| 600Ð²| 363030C390C2B2 
  |
| 60Ð² | 3630C390C2B2   
  |
| 90Ñ… | 3930C391E280A6 
  |
+--+--+
30 rows in set (0,00 sec)



mysql> use mnogosearch;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' 
LIMIT 30;
+--+--+
| word | hex(word)|
+--+--+
| 000Ð²| 303030D0B2   |
| 099Ð²| 303939D0B2   |
| 107Ñ€Ñ•  | 313037D180D195   |
| 10Ð¼Ð»Ð½ | 3130D0BCD0BBD0BD |
| 11Ð² | 3131D0B2 |
| 18Ð² | 3138D0B2 |
| 1970Ñ…   | 31393730D185 |
| 1980Ð³   | 31393830D0B3 |
| 1Ð²  | 31D0B2   |
| 1Ñ€  | 31D180   |
| 2001Ð³   | 32303031D0B3 |
| 2002Ñ€Ñ– | 32303032D180D196 |
| 2004Ð³   | 32303034D0B3 |
| 2006Ð³   | 32303036D0B3 |
| 2008Ð³   | 32303038D0B3 |
| 2009Ð³   | 32303039D0B3

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Can you try this one:

SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' LIMIT 
30;

The idea is to get words with Cyrillic letters and see
their HEX representation.



> I got "Empty set" for both databases.
> 
> mysql> use mnogosearch_new;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
> Empty set (0,02 sec)
> 
> 
> mysql> use mnogosearch;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
> Empty set (0,02 sec)
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
I got "Empty set" for both databases.

mysql> use mnogosearch_new;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
Empty set (0,02 sec)


mysql> use mnogosearch;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
Empty set (0,02 sec)


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
Thank you!

The problem with search.cgi was really because of the changed format search.htm
But I have problems with encodings (e.g. Cyrillic windows-1251 or UTF-8).
I installed both versions of mnogosearch with separate bases, but with the same 
settings.
The old version works fine, but the new one has problems.

Encoding settings:
indexer.conf
  RemoteCharset windows-1251
  LocalCharset UTF-8

search.htm
  string BrowserCharset= "windows-1251";
  string LocalCharset= "UTF-8";


1) The New version requires that the base encoding by default coincided with 
LocalCharset:
ALTER DATABASE `mnogosearch_new` DEFAULT CHARACTER SET utf8 COLLATE 
utf8_unicode_ci;

Otherwise, you get the message in stderr:
An error occurred!
DB: MySQL driver: #1267: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) 
and (utf8_general_ci,COERCIBLE) for operation '='


2) With the same settings in  indexer.conf and search.htm  the search in the 
Cyrillic is not working in the new version of mnogosearch.
Setting of BrowserCharset= "UTF-8" does not change anything.

Your search - "агент" - did not match any documents.

Debug log:
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmFind
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Prepare
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Prepare
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWords
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWordsDB for 
mysql://mnogosearch_new:***@localhost/mnogosearch_new/?dbmode=blob=UTF-8
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start loading limits
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} WHERE limit loaded. 149 URLs 
found
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  loading limits 
 0.01 (149 URLs found)
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching words
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start search for 
'Р°РіРµРЅСM-^B'
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  FindWordsDB:   
 0.01
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmQueryConvert
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmQueryConvert:   
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Excerpts
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Excerpts:  
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start WordInfo
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  WordInfo:  
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmFind:   
 0.01


3) When searching for words in the Latin, the base gives the text fragments in 
the correct Cyrillic, but the header of each retrieved document is always 
issued in the wrong encoding:
navigator : 405 
Results 1-10 of 99 ( 0.021 seconds)
?“?»?°?°??   [ 15.095% Popularity: 0.89705 ]
... сети Интернет по адресу: http://navigator***.ru Прежде чем приобрести ...



I would be very grateful for help with solving the last two problems.

Generally, when we install programs, they have the possibility of issuing 
various warning messages.
It would be nice if a new version of mnogosearch will warn about occurred 
serious changes.
I set up our old CMS to the new server and there are possible experiments. But 
if a new version of mnogosearch will installed as one of the updates to the 
server under working loads, then there would be a complete disaster.



Regarding to a long hang of mnogosearch indexing.
I found that this is due to the very slow network retrieval of large PDF 
documents.
I tried to set minimum limits of timeouts, but it does not help.
MaxNetErrors 10
ReadTimeOut 10s
DocTimeOut 30s

For example, I tried to set a time limit of 300s indexing, but indexing took 
1360s. Moreover, the document was not indexed.
/usr/local/bin/indexer -ob -v6 -N 1 -c 300 
/usr/local/etc/mnogosearch/indexer.conf 2> /var/log/mnogosearch.log
--
Done (1360 seconds, 1 documents, 11049522 bytes,  7.93 Kbytes/sec.)

I sent you the log of attempt of indexing this one document.

When I set: 
Disallow *.pdf
indexing is fast.

Why is setting of time limits doesn't help? How can avoid such lockups of the 
indexing process?


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-05-31 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> I just tried unsuccessfully to install and configure the mnogosearch-3.4.1 on 
> FreeBSD 10.3
> I lost a lot of time, because it turned out that "search.cgi" fundamentally 
> does not work, and without any diagnostic information.
> I many times check it out by different ways. The search base is created 
> successfully, but it is impossible to use it. 
> There is no difference when building a program from the ports or from the 
> archive on your website.
> 
> 
> The test script, recommended by you, gives an empty output when run in the 
> console. 
> --
> #!/bin/sh
> 
> echo Content-Type: text/plain
> echo
> /usr/local/bin/search.cgi navigator 2>&1
> --

How does your search.htm look like?

It should start with a processing instruction, like this:


 
> 
> The cgi script log contains only:
> --
> %% [Fri May 27 18:54:09 2016] GET /cgi-bin/test_search.cgi HTTP/1.1
> %% 500 /data/sites/cgi-bin/test_search.cgi
> %request
> Host: www.***
> User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:46.0) Gecko/20100101 Firefox/46.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> DNT: 1
> Cookie: user_city=1
> X-Compress: 1
> Proxy-Authorization: 
> d05a8777e173c2b13a81d919589dd9b2b9bf9911f681b2c82b1d3c9db748cfb33b300d2021dac648
> Connection: keep-alive
> %response
> 
> 
> When I try to use the search, the log contains this:
> --
> %% [Fri May 27 18:54:07 2016] GET 
> /cgi-bin/search.cgi?ul=http://www.***/=%EA%EE%EC%EF%E0%ED%E8%FF=10=2221=all=0=1=1=wrd
>  HTTP/1.1
> %% 500 /data/sites/cgi-bin/search.cgi
> %request
> Host: www.***
> Accept: */*
> %response
> 
> 
> Apache24 error log contains only:
> End of script output before headers: search.cgi
> 
> 
> I was forced to install and use the old version of the program from your 
> website.
> Can You report the problem to the package maintainer of this FreeBSD port or 
> I must to do this?
> 
> 
> 
> Additional question.
> I noticed that the program hangs for a very long time without consuming 
> system resources.
> When you start indexing, the system load is slightly increased, but it 
> decreases rapidly to zero, although the indexing process lasts a long time.
> For example, I place a limit of indexing 10min, but the program runs about 
> 12min, moreover, without consuming system resources.
> --
> #!/bin/sh
> 
> /usr/local/mnogosearch/sbin/indexer -l -Cw 
> /usr/local/mnogosearch/etc/indexer.conf > /dev/null 2>&1
> /usr/local/mnogosearch/sbin/indexer -ob -v5 -N 1 -c 600 
> /usr/local/mnogosearch/etc/indexer.conf 2> /var/log/mnogosearch.log
> /usr/local/mnogosearch/sbin/indexer -l --index
> --
> 
> What is the reason of this apparent anomaly?

Are you crawling some public site? Which URL does it get stuck on?

Can you please send mnogosearch.log to b...@mnogosearch.org?

Thanks.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-05-31 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
I just tried unsuccessfully to install and configure the mnogosearch-3.4.1 on 
FreeBSD 10.3
I lost a lot of time, because it turned out that "search.cgi" fundamentally 
does not work, and without any diagnostic information.
I many times check it out by different ways. The search base is created 
successfully, but it is impossible to use it. 
There is no difference when building a program from the ports or from the 
archive on your website.


The test script, recommended by you, gives an empty output when run in the 
console. 
--
#!/bin/sh

echo Content-Type: text/plain
echo
/usr/local/bin/search.cgi navigator 2>&1
--


The cgi script log contains only:
--
%% [Fri May 27 18:54:09 2016] GET /cgi-bin/test_search.cgi HTTP/1.1
%% 500 /data/sites/cgi-bin/test_search.cgi
%request
Host: www.***
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Cookie: user_city=1
X-Compress: 1
Proxy-Authorization: 
d05a8777e173c2b13a81d919589dd9b2b9bf9911f681b2c82b1d3c9db748cfb33b300d2021dac648
Connection: keep-alive
%response


When I try to use the search, the log contains this:
--
%% [Fri May 27 18:54:07 2016] GET 
/cgi-bin/search.cgi?ul=http://www.***/=%EA%EE%EC%EF%E0%ED%E8%FF=10=2221=all=0=1=1=wrd
 HTTP/1.1
%% 500 /data/sites/cgi-bin/search.cgi
%request
Host: www.***
Accept: */*
%response


Apache24 error log contains only:
End of script output before headers: search.cgi


I was forced to install and use the old version of the program from your 
website.
Can You report the problem to the package maintainer of this FreeBSD port or I 
must to do this?



Additional question.
I noticed that the program hangs for a very long time without consuming system 
resources.
When you start indexing, the system load is slightly increased, but it 
decreases rapidly to zero, although the indexing process lasts a long time.
For example, I place a limit of indexing 10min, but the program runs about 
12min, moreover, without consuming system resources.
--
#!/bin/sh

/usr/local/mnogosearch/sbin/indexer -l -Cw 
/usr/local/mnogosearch/etc/indexer.conf > /dev/null 2>&1
/usr/local/mnogosearch/sbin/indexer -ob -v5 -N 1 -c 600 
/usr/local/mnogosearch/etc/indexer.conf 2> /var/log/mnogosearch.log
/usr/local/mnogosearch/sbin/indexer -l --index
--

What is the reason of this apparent anomaly?



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Index full html code in DDB

2016-05-30 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,

> Hello,
> 
> I would like to crawl the whole html code for each url.

Perhaps cached copy is what you're looking for.
In 3.4.x cached copies are stored in a separate table "cachedcopy".
Cached copies are compressed by default, but compression can
be switched off:

http://www.mnogosearch.org/doc34/msearch-cmdref-cachedcopyencoding.html


> 
> Is there anyway to do this ?
> 
> I've tried this in the indexer.conf but it doesn't work :
> 
> Section headhtml   25 2058 "]*)>(*.)" $2
> Section bodyhtml   26 2058 "]*)>(*.)" $2
> Section htmlcode25 2058 "]*)>(*.)" $2
> 
> Section body1   2018afterheadershtml
> gets the body but with all htlm tags stripped out :(
> 
> 
> Thank you for your help
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Index full html code in DDB

2016-05-30 Thread bar

Author: rafikCyc
Email: rafikothm...@gmail.com
Message:
Hello,

I would like to crawl the whole html code for each url.

Is there anyway to do this ?

I've tried this in the indexer.conf but it doesn't work :

Section headhtml   25 2058 "]*)>(*.)" $2
Section bodyhtml   26 2058 "]*)>(*.)" $2
Section htmlcode25 2058 "]*)>(*.)" $2

Section body1   2018afterheadershtml
gets the body but with all htlm tags stripped out :(


Thank you for your help


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Indexer: unknown option '-E'

2016-05-26 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
Thanks for the quick response!
It works!

# indexer --index
[22029]{--} Indexing
[22029]{--} Loading URL list
[22029]{--} URL list loaded: 250 documents, 0.01 sec
[22029]{--} Indexing document contents
[22029]{--} Freeing cache
[22029]{--} Freeing cache done: 0.00
[22029]{--} Indexing document contents done: 0.17
[22029]{--} Indexing statistics:
[22029]{--} - Loading cached copies:  0.00 (1048386 bytes)
[22029]{--} - Unpacking cached copies:0.01
[22029]{--} - Parsing documents:  0.02
[22029]{--} - Breaking sections to words: 0.02
[22029]{--} - Sorting word list:  0.01
[22029]{--} - Groupping words:0.02
[22029]{--} - Sorting words: 0.01
[22029]{--} - Packing words: 0.00
[22029]{--} - Sending words: 0.04
[22029]{--} Indexing URL text
[22029]{--} Loading redirects
[22029]{--} Loading redirects done: 6 links, 0.00 sec
[22029]{--} Loading links
[22029]{--} Loading links done: 0.05 sec
[22029]{--} Calculating popularity: 250 documents, 2381 links
[22029]{--} Enabling SQL indexes
[22029]{--} Enabling SQL indexes done, 0.01 sec
[22029]{--} Writing url data
[22029]{--} Rotating table
[22029]{--} Indexing done   0.25


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Indexer: unknown option '-E'

2016-05-26 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi,

> Hi!
> 
> I did a fresh install of mnogosearch-3.4.1 on FreeBSD 10.3
> Indexer run OK, but I have error:
> # indexer -l -Eblob
> /usr/local/bin/indexer: unknown option '-E'

Please use "indexer --index" instead of "indexer -Eblob".

> 
> As a result, the search base does not work.
> 
> The database was created by:
> # indexer --create
> 
> indexer.conf
> DBAddr 
> mysql://mnogosearch:***@localhost/mnogosearch/?dbmode=blob=/tmp/mysql.sock=yes
> 
> StatusExpired  Total
>-
>  0 49115 Not indexed yet
>200  0129 OK
>301  0  6 Moved Permanently
>-
>  Total 49250
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Indexer: unknown option '-E'

2016-05-26 Thread bar

Author: Dmitriy Kulikov
Email: 
Message:
Hi!

I did a fresh install of mnogosearch-3.4.1 on FreeBSD 10.3
Indexer run OK, but I have error:
# indexer -l -Eblob
/usr/local/bin/indexer: unknown option '-E'

As a result, the search base does not work.

The database was created by:
# indexer --create

indexer.conf
DBAddr 
mysql://mnogosearch:***@localhost/mnogosearch/?dbmode=blob=/tmp/mysql.sock=yes

StatusExpired  Total
   -
 0 49115 Not indexed yet
   200  0129 OK
   301  0  6 Moved Permanently
   -
 Total 49250


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-14 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> I add those Disallow lines so that, both app can crawl the same 
> number of urls (approximatively 140), 
> 
> Disallow */basket-villeurbanne/author/*
> Disallow *?p=*
> Disallow */feed
> 
> As it seems that Mnogosearch can manage a robots.txt, but not the 
> meta robots noindex,follow
> 


mnoGoSearch supports meta robots.
Can you please give an URL of a document whose robots directives are ignored. 
I'll check what's happenning.


> Here are the results :
> 
> -
> 
> indexer -C;
> indexer;
> 
> [18898]{01} Done (53 seconds, 168 documents, 3503752 bytes, 64.56 
> Kbytes/sec.)
> 
> --
> 
> indexer -C;
> inexer -N5;
> 
> [19261]{02} Done (15 seconds, 46 documents, 982938 bytes, 63.99 
> Kbytes/sec.)
> [19261]{03} Done (15 seconds, 48 documents, 930200 bytes, 60.56 
> Kbytes/sec.)
> [19261]{01} Done (5 seconds, 14 documents, 323667 bytes, 63.22 
> Kbytes/sec.)
> [19261]{05} Done (15 seconds, 46 documents, 974427 bytes, 63.44 
> Kbytes/sec.)
> [19261]{04} Done (5 seconds, 14 documents, 292520 bytes, 57.13 
> Kbytes/sec.)
> [19261]{--} Done (26 seconds, 168 documents, 3503752 bytes, 131.60 
> Kbytes/sec.)
> 
> 
> indexer -C;
> indexer -N50;
> [20289]{11} Done (11 seconds, 28 documents, 585571 bytes, 51.99 
> Kbytes/sec.)
> [20289]{28} Done (11 seconds, 29 documents, 705247 bytes, 62.61 
> Kbytes/sec.)
> [20289]{16} Done (11 seconds, 30 documents, 635782 bytes, 56.44 
> Kbytes/sec.)
> [20289]{30} Done (11 seconds, 30 documents, 635178 bytes, 56.39 
> Kbytes/sec.)
> [20289]{--} Done (21 seconds, 168 documents, 3504392 bytes, 162.96 
> Kbytes/sec.)
> 
> 
> mysql -uroot -p -N --database=db_test_mnogo --execute="SELECT url 
> FROM url" > ~/ALL.txt;
> 
> (cat ~/ALL.txt | parallel -j8 --gnu "wget {}");
> 
> real  0m10.638s
> user  0m1.256s
> sys   0m1.519s
> 
> 
> ---
> 
> Screaming Frog : 12s
> 
> 
> It just confirm the fact that Mnogosearch is relatively slower than 
> Sreaming Frog, and even when i compare to parallel wget bash, 
> mnogosearch is slower.
> 
> It get little better with indexer -N50 though.


Well, this effect can happen with a *small* site, with an empty database.

When indexer starts multiple threads (say 10) and the database is empty, 9 
threads immediately go to sleep for 10 seconds.
So only the first thread is actually working.

After 10 seconds the database is not empty, because the first thread has 
collected some links.

So it actually start working in multi-thread mode after 10 seconds only.

With a bigger site you will not see any difference between mnoGoSearch
vs wget/frog.


If you really need to crawl a small site quickly,
please apply this patch:


=== modified file 'src/indexer.c'
--- src/indexer.c   2016-03-30 12:13:49 +
+++ src/indexer.c   2016-05-14 08:28:25 +
@@ -2872,7 +2872,7 @@ int maxthreads=   1;
 UDM_CRAWLER *ThreadCrawlers= NULL;
 int thd_errors= 0;
 
-#define UDM_NOTARGETS_SLEEP 10
+#define UDM_NOTARGETS_SLEEP 0
 
 #ifdef  WIN32
 unsigned int __stdcall UdmCrawlerMain(void *arg)



Here are the results:

./indexer -Cw ; ./indexer -N10
[5853]{--} Done (12 seconds, 168 documents, 3504192 bytes, 285.17 Kbytes/sec.)


It's now as fast as wget and frog, and it crawls more documents (168 vs 140).


Please note:
Aggressive crawling is not polite and can be even considered as an 
attack. It better not to crawl sites that way, unless it is your
own site, or unless site owners allow you to do it this way.



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-09 Thread bar

Author: rafikCyc
Email: 
Message:
I add those Disallow lines so that, both app can crawl the same 
number of urls (approximatively 140), 

Disallow */basket-villeurbanne/author/*
Disallow *?p=*
Disallow */feed

As it seems that Mnogosearch can manage a robots.txt, but not the 
meta robots noindex,follow

Here are the results :

-

indexer -C;
indexer;

[18898]{01} Done (53 seconds, 168 documents, 3503752 bytes, 64.56 
Kbytes/sec.)

--

indexer -C;
inexer -N5;

[19261]{02} Done (15 seconds, 46 documents, 982938 bytes, 63.99 
Kbytes/sec.)
[19261]{03} Done (15 seconds, 48 documents, 930200 bytes, 60.56 
Kbytes/sec.)
[19261]{01} Done (5 seconds, 14 documents, 323667 bytes, 63.22 
Kbytes/sec.)
[19261]{05} Done (15 seconds, 46 documents, 974427 bytes, 63.44 
Kbytes/sec.)
[19261]{04} Done (5 seconds, 14 documents, 292520 bytes, 57.13 
Kbytes/sec.)
[19261]{--} Done (26 seconds, 168 documents, 3503752 bytes, 131.60 
Kbytes/sec.)


indexer -C;
indexer -N50;
[20289]{11} Done (11 seconds, 28 documents, 585571 bytes, 51.99 
Kbytes/sec.)
[20289]{28} Done (11 seconds, 29 documents, 705247 bytes, 62.61 
Kbytes/sec.)
[20289]{16} Done (11 seconds, 30 documents, 635782 bytes, 56.44 
Kbytes/sec.)
[20289]{30} Done (11 seconds, 30 documents, 635178 bytes, 56.39 
Kbytes/sec.)
[20289]{--} Done (21 seconds, 168 documents, 3504392 bytes, 162.96 
Kbytes/sec.)


mysql -uroot -p -N --database=db_test_mnogo --execute="SELECT url 
FROM url" > ~/ALL.txt;

(cat ~/ALL.txt | parallel -j8 --gnu "wget {}");

real0m10.638s
user0m1.256s
sys 0m1.519s


---

Screaming Frog : 12s


It just confirm the fact that Mnogosearch is relatively slower than 
Sreaming Frog, and even when i compare to parallel wget bash, 
mnogosearch is slower.

It get little better with indexer -N50 though.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Here is the site : http://www.asbuers.com/

After crawling this site with mnoGoSearch, I did the following:

# Extracted the list of all documents found (478 documents)
mysql -uroot -N --database=tmp --execute="SELECT url FROM url" >ALL.txt

# Run "wget" with 8 threads 
time (cat ALL.txt | parallel -j8 --gnu "wget {}")


With 8 parallel processes, wget downloaded this site in 38 seconds,
which is around the same time that mnoGoSearch spends on the same site.

I guess when you run screaming frog, it's not really downloading the entire 
site.



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar

Author: rafikCyc
Email: 
Message:
Thank you for the reply.

Well, You're right...
With -P0 it does not have a limit of 1s.

But it remain very slow though.

--

I just did a quick speed test on a small site (500 documents)
Mnogosearch VS screaming frog.

The results :

Mnogosearch : 3.2 urls / second
Screaming Frog: 40 urls / second

Same connection , same remote site, but 10 time faster :(

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Hello,
> 
> I've tried this :
> 
> ./indexer -p 0
> 
> but it doesn't work :(
> The indexer sleeps for at least one seconde after each URL.

With -p0 it does not do any delays between URLs.
I guess the bottleneck is in the connection, or in the remote site.


To speed up crawling performance, you can run multiple crawling threads in 
parallel, for example:

indexer -N5

Make sure not to put the the remote site down though.


> 
> It seems impossible to index faster than 1s after each url.
> 
> To index 300 000 document on my website for example, the crawl takes 2 full 
> days !
> 
> Is there a solution ?

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Indexer : How to get all H2 tags on the page

2016-05-04 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Hello,
> 
> Section h2 23 256 "]*)>([^$3]+)()" $2
> 
> this work well, but the indexer only store the first  he found in 
> database and ignore all the others H2.
> 
> Is there a way to get all tags like the function preg_match_all does in PHP ?

Unfortunately, there is no a feature like this yet.



Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: How to speed up the crawl delay after each URL ?

2016-05-04 Thread bar

Author: rafikCyc
Email: 
Message:
Hello,

I've tried this :

./indexer -p 0

but it doesn't work :(
The indexer sleeps for at least one seconde after each URL.

It seems impossible to index faster than 1s after each url.

To index 300 000 document on my website for example, the crawl takes 2 full 
days !

Is there a solution ?

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Indexer : How to get all H2 tags on the page

2016-05-04 Thread bar

Author: rafikCyc
Email: 
Message:
Hello,

Section h2 23 256 "]*)>([^$3]+)()" $2

this work well, but the indexer only store the first  he found in database 
and ignore all the others H2.

Is there a way to get all tags like the function preg_match_all does in PHP ?

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: running indexer in link validation mode

2016-04-18 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,

> Hello,
> 
> How would I go about running the indexer and not saving any URLs or 
> content or anything into the database, but only store/list/log 
> somewhere, somehow the links that are no longer working? HTTP status 
> code > 400.
> 
> Currently I am running the indexer with the default config but it stores 
> everything.
> 
> I have set it to hold bad hrefs for 30 days 

In versions 3.3.x just remove the command "Section CachedCopy..."
to disable saving document content into the database.





Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: running indexer in link validation mode

2016-04-13 Thread bar

Author: nick
Email: nick.everl...@yahoo.com
Message:
Hello,

How would I go about running the indexer and not saving any URLs or 
content or anything into the database, but only store/list/log 
somewhere, somehow the links that are no longer working? HTTP status 
code > 400.

Currently I am running the indexer with the default config but it stores 
everything.

I have set it to hold bad hrefs for 30 days 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Section regex format

2016-03-19 Thread bar

Author: Julien D.
Email: jul...@1-clic.info
Message:
Hello Alexander,

Thanks for your answer, I got it working by adding the next tag :
(the stuff i want).*


However, how can I retrieve the HTML code from that Section, and not just the 
text 
? I tried using "format" and "when" as explained here : 
http://www.mnogosearch.org/doc34/msearch-cmdref-section.html 
But couldn't make it work.

Thanks in advance.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Some section not indexed in DB

2016-02-12 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi Guillaume,

title, body and meta.description are not really needed to be in urlinfo for 
search purposes in 3.4.x. Search and search result presentation should work 
fine.

But you might of course need them for some other external purposes, e.g. site 
analysis. The intention in the latest changes in 3.4.x
was not to store sections in urlinfo by default, but they should be
stored if the "length" parameter is set to non-zero.
It seems something went wrong. I'll check it after the weekend
(currently out of my development box).


> Hi Again,
> 
> I'm having a problem with some Section lines in the indexer.conf wit 
> mnogosearch 3.4.1
> 
> Here is an extract of my indexer.conf :
> 
> Section ResponseTime0   32
> # Standard sections: body, title
> Section body1   1024
> Section title   2   256
> 
> # HTML meta tags, e.g. 
> Section meta.keywords   3
> Section meta.description4   256
> 
> # Incoming link text
> Section ilinktext   5   128
> 
> # Document's URL part
> Section url.file6   0
> Section url.path7   0
> Section url.host8   0
> Section url.proto   9   0
> 
> # Useful meta information
> Section Charset 10  32
> Section Content-Type11  64
> Section Content-Language12  16
> 
> # Message/rfc822 headers
> #Section msg.from   15
> #Section msg.to 16
> #Section msg.subject17
> 
> # A user defined section example.
> # Extract text between  and  tags:
> #Section h1 20 128 "(.*)" $1
> Section h1  26  256 "]*>(.*)" $1
> Section h2  26  256 "]*>(.*)" $1
> Section h3  26  256 "]*>(.*)" $1
> Section canonical   33  1024 ' +href="([^"]*)"' $1
> Section ogdescription   33  300  ' +content="([^"]*")' $1
> Section ogtitle 34  128  ' +content="([^"]*")' $1
> 
> # Uncomment the following lines if you want index MP3 tags.
> #Section MP3.Song   25
> #Section MP3.Album  26
> #Section MP3.Artist 27
> #Section MP3.Year   28
> 
> # HTTP headers, e.g. "Server" HTTP header
> #Section header.server  30
> Section header  30  128
> Section header.server   30  128
> Section header.Date 30  128
> Section header.Last-Modified30  128
> Section header.Etag 30  128
> Section header.X-Robots-Tag 30  128
> # HTML tag attributes
> Section attribute.alt   35  128
> Section attribute.label 36  128
> Section attribute.summary   37  128
> Section attribute.title 38  128
> 
> 
> 
> And after crawl, the only info saved in the urlinfo table are : 
> Canonical
> Charset
> Content-language
> Content-type
> h1
> h2
> h3
> ogdescription
> ogtitle
> ResponseTime
> 
> As we can see various sections are missing, including some importants one as 
> Title and meta.description which I've checked exist in my server.
> This results are the same for various documents and various servers.
> 
> I've also tried to not set a length to title, body and meta.description as in 
> the 3.4 documentation example, but is doesn't work better.
> 
> Did I miss something ?
> 
> Thanks for the help, mnogosearch is a great tool !
> 


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Some section not indexed in DB

2016-02-10 Thread bar

Author: Guillaume
Email: inscript...@atlza.com
Message:
Hi Again,

I'm having a problem with some Section lines in the indexer.conf wit 
mnogosearch 3.4.1

Here is an extract of my indexer.conf :

Section ResponseTime0   32
# Standard sections: body, title
Section body1   1024
Section title   2   256

# HTML meta tags, e.g. 
Section meta.keywords   3
Section meta.description4   256

# Incoming link text
Section ilinktext   5   128

# Document's URL part
Section url.file6   0
Section url.path7   0
Section url.host8   0
Section url.proto   9   0

# Useful meta information
Section Charset 10  32
Section Content-Type11  64
Section Content-Language12  16

# Message/rfc822 headers
#Section msg.from   15
#Section msg.to 16
#Section msg.subject17

# A user defined section example.
# Extract text between  and  tags:
#Section h1 20 128 "(.*)" $1
Section h1  26  256 "]*>(.*)" $1
Section h2  26  256 "]*>(.*)" $1
Section h3  26  256 "]*>(.*)" $1
Section canonical   33  1024 'http://www.mnogosearch.org/board/message.php?id=21746>

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Segmentation fault at running ./indexer with ServerTable

2016-02-06 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi,

can you please send the entire indexer.conf file to b...@mnogosearch.org?

Thanks.

> Hi,
> 
> I'm using Mnogosearch 3.3.15 and everything works fine while I'm indexing 
> website with a Server command in the indexer.conf file.
> 
> When trying to switch to a ServerTable command I get a segmentation fault.
> Here is the line in my indexer.conf 
> 
> ServerTable 
> mysql://login:password@localhost/mnogosearch/my_server?srvinfo=my_srvinfo
> 
> If I comment it, everything works great again.
> 
> Here is what appears in the syslog :
> Jan 20 16:03:38 myServer01 kernel: [3027639.055910] indexer[12555]: segfault 
> at 860 ip 00437e4c sp 7fffe94f34f0 error 4 in 
> indexer[40+8b000]
> 
> Thanks a lot for your help.
> 
> Regards.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: my_server and server tables rec_id

2016-02-04 Thread bar

Author: Guillaume
Email: inscript...@atlza.com
Message:
OK, note sure it is the purpose of tag field but I managed to do what I need 
with this 
field.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Segmentation fault at running ./indexer with ServerTable

2016-01-20 Thread bar

Author: Guillaume
Email: inscript...@atlza.com
Message:
Hi,

I'm using Mnogosearch 3.3.15 and everything works fine while I'm indexing 
website with a Server command in the indexer.conf file.

When trying to switch to a ServerTable command I get a segmentation fault.
Here is the line in my indexer.conf 

ServerTable 
mysql://login:password@localhost/mnogosearch/my_server?srvinfo=my_srvinfo

If I comment it, everything works great again.

Here is what appears in the syslog :
Jan 20 16:03:38 myServer01 kernel: [3027639.055910] indexer[12555]: segfault at 
860 ip 00437e4c sp 7fffe94f34f0 error 4 in indexer[40+8b000]

Thanks a lot for your help.

Regards.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Installing PHP 7.0 module fails with mnoGoSearch (3.4.1 and older )

2016-01-16 Thread bar

Author: Flözen
Email: illenber...@visionstudio.de
Message:
Near future would be great! :) 

Webservers with Plesk now support PHP 7 and it seams to be really fast.

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Installing PHP 7.0 module fails with mnoGoSearch (3.4.1 and older )

2016-01-13 Thread bar

Author: Flözen
Email: 
Message:
Hi,

installier the PHP module for PHP 7.0.1 fails.

When I run "make" I get following errors:

/bin/sh /home/floezen/mnogosearch/mnogosearch-3.4.1/php/libtool --mode=compile 
cc  -I. -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php -DPHP_ATOM_INC -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php/include -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php/main -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php 
-I/opt/plesk/php/7.0/include/php -
I/opt/plesk/php/7.0/include/php/main -I/opt/plesk/php/7.0/include/php/TSRM -
I/opt/plesk/php/7.0/include/php/Zend -I/opt/plesk/php/7.0/include/php/ext -
I/opt/plesk/php/7.0/include/php/ext/date/lib -I/usr/local/mnogosearch/include  -
DHAVE_CONFIG_H  -g -O2   -c /home/floezen/mnogosearch/mnogosearch-
3.4.1/php/php_mnogo.c -o php_mnogo.lo 
libtool: compile:  cc -I. -I/home/floezen/mnogosearch/mnogosearch-3.4.1/php -
DPHP_ATOM_INC -I/home/floezen/mnogosearch/mnogosearch-3.4.1/php/include -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php/main -
I/home/floezen/mnogosearch/mnogosearch-3.4.1/php 
-I/opt/plesk/php/7.0/include/php -
I/opt/plesk/php/7.0/include/php/main -I/opt/plesk/php/7.0/include/php/TSRM -
I/opt/plesk/php/7.0/include/php/Zend -I/opt/plesk/php/7.0/include/php/ext -
I/opt/plesk/php/7.0/include/php/ext/date/lib -I/usr/local/mnogosearch/include -
DHAVE_CONFIG_H -g -O2 -c /home/floezen/mnogosearch/mnogosearch-
3.4.1/php/php_mnogo.c  -fPIC -DPIC -o .libs/php_mnogo.o
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:197:29: error: 
unknown type 
name 'zend_rsrc_list_entry'
 static void _free_udm_agent(zend_rsrc_list_entry *rsrc TSRMLS_DC)
 ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:204:27: error: 
unknown type 
name 'zend_rsrc_list_entry'
 static void _free_udm_res(zend_rsrc_list_entry *rsrc TSRMLS_DC)
   ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c: In function 
'zm_startup_mnogosearch':
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:215:46: error: 
'_free_udm_agent' undeclared (first use in this function)
   le_link= zend_register_list_destructors_ex(_free_udm_agent,NULL,"mnogosearch 
agent",module_number);
  ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:215:46: note: each 
undeclared identifier is reported only once for each function it appears in
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:216:45: error: 
'_free_udm_res' undeclared (first use in this function)
   le_res= zend_register_list_destructors_ex(_free_udm_res,NULL,"mnogosearch 
result",module_number);
 ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c: In function 
'zif_udm_alloc_agent':
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:433:9: warning: 
'zend_get_parameters_ex' is deprecated (declared at 
/opt/plesk/php/7.0/include/php/Zend/zend_API.h:249) [-Wdeprecated-declarations]
 if(zend_get_parameters_ex(1,) == FAILURE)
 ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:436:9: warning: 
passing 
argument 1 of 'zval_get_type' from incompatible pointer type [enabled by 
default]
 convert_to_string_ex(yydbaddr);
 ^
In file included from /opt/plesk/php/7.0/include/php/Zend/zend.h:31:0,
 from /opt/plesk/php/7.0/include/php/main/php.h:35,
 from 
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:29:
/opt/plesk/php/7.0/include/php/Zend/zend_types.h:326:38: note: expected 'const 
struct zval *' 
but argument is of type 'struct zval **'
 static zend_always_inline zend_uchar zval_get_type(const zval* pz) {
  ^
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:436:9: warning: 
passing 
argument 1 of 'zval_get_type' from incompatible pointer type [enabled by 
default]
 convert_to_string_ex(yydbaddr);
 ^
In file included from /opt/plesk/php/7.0/include/php/Zend/zend.h:31:0,
 from /opt/plesk/php/7.0/include/php/main/php.h:35,
 from 
/home/floezen/mnogosearch/mnogosearch-3.4.1/php/php_mnogo.c:29:
/opt/plesk/php/7.0/include/php/Zend/zend_types.h:326:38: note: expected 'const 
struct zval *' 
but argument is of type 'struct zval **'
 static zend_always_inline zend_uchar zval_get_type(const zval* pz) {
   

... and so on.

What do I need to do to fix this?

Thanx
Flözen

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Installing PHP 7.0 module fails with mnoGoSearch (3.4.1 and older )

2016-01-13 Thread bar

Author: Alexander Barkov
Email: 
Message:
Hi Flözen,

> Hi,
> 
> installier the PHP module for PHP 7.0.1 fails.
> 
> When I run "make" I get following errors:
> 

It compiles fine with PHP-5.6, which is currently the default version in the 
current Linux distributions.

Unfortunately I haven't ported the PHP module to PHP-7 yet.
Hope to do this in the near future.


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Alt-Text in Map-Area-Tags

2015-11-09 Thread bar

Author: Mitja Orzeszko
Email: mi...@orzeszko.de
Message:
Hello,

will Alt-Text in Map-Area-Tags be indexed?





Thanks

Mitja


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Alt-Text in Map-Area-Tags

2015-11-09 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,
> Hello,
> 
> will Alt-Text in Map-Area-Tags be indexed?
> 
> 
>   
> 

You need to use this command to enable indexing of these attributes:

Section attribute.alt ...


See here for examples:
http://www.mnogosearch.org/doc33/msearch-cmdref-section.html

> 
> Thanks
> 
> Mitja
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Set categories/tags via HTML tag attribute?

2015-10-21 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> I would like to use different tags (or categories) for different 
> parts of a website and I would like to have a "Default" tag which 
> includes all pages that are not part of another category. I tried to 
> configure different tags for different URLs:
> 
> > Tag A
> > http://www.example.com/part-a/
> > 
> > Tag B
> > http://www.example.com/part-b/
> > 
> > Tag C
> > http://www.example.com/
> 
> The problem is that the tag C includes all pages of 
> http://www.example.com/, including those of /part-a/ and /part-b/.
> 

The above configuration looks fine.
Note, after adding the Tag commands it needs full re-crawling.
The easiest way is just to clean the database and crawl again:

indexer -Cw
indexer


> Is there a possibility to set the tag which should be used for a 
> page, e. g. with a document section? Like ' name="mnoGoSearchTag" content="B" />'? Or does anybody have a clue 
> how to solve this issue?

There is no a way to set exactly the Tag value this way.
But you can use a meta.* section for the same purpose, say:

Section meta.mnoGoSearchTag  14 0

After full re-crawling, the collected values can be used to limit search
with help of sl.mnoGoSearchTag=B search parameter.
See here how to pass search parameters:

http://www.mnogosearch.org/doc33/msearch-doingsearch.html#AEN4693

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Set categories/tags via HTML tag attribute?

2015-10-12 Thread bar

Author: Felix Heller
Email: felix.hel...@aimcom.de
Message:
I would like to use different tags (or categories) for different 
parts of a website and I would like to have a "Default" tag which 
includes all pages that are not part of another category. I tried to 
configure different tags for different URLs:

> Tag A
> http://www.example.com/part-a/
> 
> Tag B
> http://www.example.com/part-b/
> 
> Tag C
> http://www.example.com/

The problem is that the tag C includes all pages of 
http://www.example.com/, including those of /part-a/ and /part-b/.

Is there a possibility to set the tag which should be used for a 
page, e. g. with a document section? Like ''? Or does anybody have a clue 
how to solve this issue?

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: indexing time

2015-02-18 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 We've used mnogosearch in the past and were happy with it.  But when 
 we installed it on that old site, it took several days to index the 
 site and reduced site performance while that was going on.  
 
 That was several versions ago and on a less powerful server.  Now we 
 are considering using it on a new site but are concerned about how 
 long it will take to index the site and performance while that 
 happens.  
 
 Hoping newer versions are faster or that a more powerful server will 
 help that issue. 
 
 But I'm wondering if you can give me any idea of how long it might 
 take to index around 30,000 pages on our site ICv2.com
 

I have a collection of ~52,000 documents on my local disk,
with the total collection size being 704Mb.

With DBMode=blob it takes 8 minutes to crawl the collection,
and 50 seconds to actually index the collection after crawling.

Which version and DBMode are you using?



Reply: http://www.mnogosearch.org/board/message.php?id=21689

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: indexing time

2015-02-14 Thread bar

Author: Milton Griepp
Email: mgri...@icv2.com
Message:
We've used mnogosearch in the past and were happy with it.  But when 
we installed it on that old site, it took several days to index the 
site and reduced site performance while that was going on.  

That was several versions ago and on a less powerful server.  Now we 
are considering using it on a new site but are concerned about how 
long it will take to index the site and performance while that 
happens.  

Hoping newer versions are faster or that a more powerful server will 
help that issue. 

But I'm wondering if you can give me any idea of how long it might 
take to index around 30,000 pages on our site ICv2.com


Reply: http://www.mnogosearch.org/board/message.php?id=21688

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGo 3.4.0, no results...

2015-01-27 Thread bar

Author: B3r3n
Email: 
Message:
mnogosearch 3.4.0 :-)
MySQL is 5.6.16


Reply: http://www.mnogosearch.org/board/message.php?id=21686

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: how to rss or xml output ?

2015-01-17 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 In order to avoid support to spend time ... I found it !
 ... just in case other need it : -- look in cluster section, it's not made 
 for that, but it does it

Right, just use node.xml as an example of a template with XML output.
You can easily further adjust it according to your needs.

 
 (mnogosearch has terrific possibilities ; thanks guys !)



Reply: http://www.mnogosearch.org/board/message.php?id=21682

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: max 3 results from each site

2015-01-17 Thread bar

Author: Fabrice
Email: 
Message:
Maybe I missed something in explanations ...

In the search results, how can I limit the results from each same site to (for 
example) max 3 from each.

The problem I have is that I have a mix of huge site+very small sites indexed, 
and it is a bit annoying to have the 20 first results from the same one (this 
example is not my case, but imagine if you have Wikipédia in your indexed 
sites).
But at the oposite, only one from each is also not nice

so 3-4 from each for example
Possible ?

Thanks,
F 

Reply: http://www.mnogosearch.org/board/message.php?id=21683

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: max 3 results from each site

2015-01-17 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 Maybe I missed something in explanations ...
 
 In the search results, how can I limit the results from each same site to 
 (for example) max 3 from each.
 
 The problem I have is that I have a mix of huge site+very small sites 
 indexed, and it is a bit annoying to have the 20 first results from the same 
 one (this example is not my case, but imagine if you have Wikipédia in your 
 indexed sites).
 But at the oposite, only one from each is also not nice
 
 so 3-4 from each for example
 Possible ?

There is no a way to limit to 3-4 results from each site.

But it's possible to boost the best result from each site,
so different sites are displayed on the top result pages.

Use GroupBySite rank explained here:

http://www.mnogosearch.org/doc33/msearch-cmdref-groupbysite.html



 
 Thanks,
 F 

Reply: http://www.mnogosearch.org/board/message.php?id=21684

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: how to rss or xml output ?

2015-01-15 Thread bar

Author: Fabrice
Email: 
Message:
In order to avoid support to spend time ... I found it !
... just in case other need it : -- look in cluster section, it's not made for 
that, but it does it

(mnogosearch has terrific possibilities ; thanks guys !)

Reply: http://www.mnogosearch.org/board/message.php?id=21681

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: how to rss or xml output ?

2015-01-14 Thread bar

Author: Fabrice
Email: 
Message:
Hi,
I searched the board, but no discussion on that topic...
So, how is it possible to query(like usual) the cgi engine, and have the result 
sent as an RSS feed ?
(if not possible, then as xml)
This would give many many possibilities  abilities to create feeds as a piece 
of cake without any other engine.
Thanks,
Fabrice
Nota: question is on search results sent back, not on indexing rss or xml which 
is corevered in several topics already

Reply: http://www.mnogosearch.org/board/message.php?id=21680

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGo 3.4.0, no results...

2015-01-11 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 Hi Alex,
 
 Some more inputs.
 
 I tested with the original search.cgi, just changing :
 DBAddr to mysql://USER:PASS@localhost/DATABASE/?dbmode=blobsoc
 ket=/tmp/mysql.sockQcache=yesps=yesDeflate=yescompression=on 
 
 Query produced no answer. I searched for 'test', 99.99% sure this 
 word exists somewhere in the database.
 
 A clue to test directly via MySQL to determine if this might come 
 from MySQL or the database itself ?
 
 Thanks

Which version is it?

Thanks.




Reply: http://www.mnogosearch.org/board/message.php?id=21674

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGo 3.4.0, no results...

2015-01-11 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi Laurent,

Sorry for delay. I was on New year holidays.

 Hi Alex,
 
 I moved to FreeBSD 10.1 x64, and this moved me to a mnoGo port 
 v3.4.0.
 First I am surprised, you only only talk about 3.3.15. What's this 
 3.4.0 ?

It's a new branch. It has not been released yet.
I hope to release it by the end of this month.
For some reasons, it has been already added into the FreeBSD port collection. 
I'd recommend to wait for the official 3.4 release
before starting using it.

If you're already using 3.4.0, please install 3.3.15 instead.


 
 Also, it was perfectly working in 3.3.1 version, but no longer.
 
 1- When I was indexing 600 Kurls with 3.3.10, the same URLs list 
 is not 25% of that size !
 
 2- My search page (PHP) was perfectly working. That is no longer 
 the case, it founds nothing. Unfortunately, I fail to produce some 
 logs (syslog is silent, even logging everything *.*) and so I cant 
 investigate by myself.
 
 Can you please help ?
 
 Thx
 
 Brgrds

Reply: http://www.mnogosearch.org/board/message.php?id=21673

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGo 3.4.0, no results...

2014-12-30 Thread bar

Author: B3r3n
Email: bozo4312
Message:
Hi Alex,

Some more inputs.

I tested with the original search.cgi, just changing :
DBAddr to mysql://USER:PASS@localhost/DATABASE/?dbmode=blobsoc
ket=/tmp/mysql.sockQcache=yesps=yesDeflate=yescompression=on 

Query produced no answer. I searched for 'test', 99.99% sure this 
word exists somewhere in the database.

A clue to test directly via MySQL to determine if this might come 
from MySQL or the database itself ?

Thanks

Reply: http://www.mnogosearch.org/board/message.php?id=21672

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGo 3.4.0, no results...

2014-12-28 Thread bar

Author: Laurent
Email: 
Message:
Hi Alex,

I moved to FreeBSD 10.1 x64, and this moved me to a mnoGo port 
v3.4.0.
First I am surprised, you only only talk about 3.3.15. What's this 
3.4.0 ?

Also, it was perfectly working in 3.3.1 version, but no longer.

1- When I was indexing 600 Kurls with 3.3.10, the same URLs list 
is not 25% of that size !

2- My search page (PHP) was perfectly working. That is no longer 
the case, it founds nothing. Unfortunately, I fail to produce some 
logs (syslog is silent, even logging everything *.*) and so I cant 
investigate by myself.

Can you please help ?

Thx

Brgrds

Reply: http://www.mnogosearch.org/board/message.php?id=21671

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: tagging or categorizing without crawling again

2014-12-12 Thread bar

Author: bruno
Email: bruno.v...@gmail.com
Message:
Thank you alexander, that was exactly what i was looking for!
Kind regards,
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21670

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: tagging or categorizing without crawling again

2014-12-11 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 Actually, the way of using tag or categories is perfect but, i don't want 
 to crawl again the whole site because i didn't write my tagging rule in 
 the correct way the first time.

This task consists of two parts:

a. update what you have in the tables server and srvinfo.
This is done automatically when you start crawling.
indexer -n0 will do this. Note, this is enough when you just need
to rename some tag to a new value.

But usually this is not enough,
as you might want to redistribute documents between tags
(i.e. split a single tag into multiple ones, or join multiple tags
into a single one, or do some more complex redistribution).
In these cases part b is also needed.


b. update the table url to refer to the table server properly.
There is no a special command for this. Normally, documents are 
updated properly only when they're crawled next time.
But there is a trick to use Skip option temporarily,
to avoid real downloading.


Suppose you want to split the section of your site
into two subsections and assign different tags for them.

What you do is:

1. Change indexer.conf:

# Remove the old command
Tag doc
Server http://host/doc/


# And add two new commands instead
Tag doca
Server skip http://host/doc/a/

Tag docb
Server skip http://host/doc/b/


Notice the skip option in the new commands.


2. Run indexer -am -u 'http://host/doc/%'

It will a kind crawl all documents, but without real downloading.
It will actually only nothing else but execute a query like this
for every document:

UPDATE url SET status=200,next_index_time=1418965297, 
site_id=-1519382294,server_id=-1738492707 WHERE rec_id=259;


3. Make sure not to forget to remove the skip options
from the new Server commands in indexer.conf.

4. Check that everything went well:
SELECT server.tag,url.url FROM url,server WHERE url.server_id=server.rec_id;




Reply: http://www.mnogosearch.org/board/message.php?id=21669

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: tagging or categorizing without crawling again

2014-12-08 Thread bar

Author: bruno
Email: bruno.v...@gmail.com
Message:
Thanks for your reply,

it would be by using documents properties.
Actually, the way of using tag or categories is perfect but, i don't want 
to crawl again the whole site because i didn't write my tagging rule in 
the correct way the first time.

Many thanks!
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21668

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: tagging or categorizing without crawling again

2014-12-06 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi Bruno,

 Hi Alexander and big congrats for the amazing tool you've built.
 I intend to use it as a seo tool but i came to an issue : i would like to 
 tag or categorize the urls after having already fetched the content but i 
 can't figure how to do it.
 We sometimes miss the correct structure and it's really a pain to have to 
 crawl again the whole site to rebuild the categorization as the urls are 
 arleady in the base.
 
 Many thanks for your help!
 kind regards,
 Bruno

How would you like to tag? Manually? Or in some automated way,
using document properties (e.g. document words, URL, etc)?


Reply: http://www.mnogosearch.org/board/message.php?id=21667

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: tagging or categorizing without crawling again

2014-12-05 Thread bar

Author: bruno
Email: bruno.v...@gmail.com
Message:
Hi Alexander and big congrats for the amazing tool you've built.
I intend to use it as a seo tool but i came to an issue : i would like to 
tag or categorize the urls after having already fetched the content but i 
can't figure how to do it.
We sometimes miss the correct structure and it's really a pain to have to 
crawl again the whole site to rebuild the categorization as the urls are 
arleady in the base.

Many thanks for your help!
kind regards,
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21666

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Sqlite 'UNIQUE constraint failed' problem

2014-08-17 Thread bar

Author: ivan
Email: sss_...@inbox.ru
Message:
Iv install product with SQliht3  on  SUSE from ppa, and on Debian 
manually buld from scratch. Same problem:
indexer  
indexer[18503]: indexer from mnogosearch-3.3.15-sqlite3 started with 
'/etc/mnogosearch/indexer.conf'
indexer[18503]: [18503]{01} URL: http://www.opennet.ru/
indexer[18503]: [18503]{01} ROBOTS: http://www.opennet.ru/robots.txt
indexer[18503]: [18503]{01} URL: http://www.opennet.ru/cgi-
bin/opennet/hints.cgi?button_network
indexer[18503]: [18503]{01} URL: 
http://www.opennet.ru/prog/info/228.shtml
indexer[18503]: [18503]{01} URL: 
http://www.opennet.ru/openforum/vsluhforumID1/95732.html
indexer[18503]: [18503]{01} URL: 
http://www.opennet.ru/openforum/vsluhforumID6/1442.html
indexer[18503]: [18503]{01} URL: 
http://www.opennet.ru/tips/2829_openvz_template.shtml
{sql.c:2152} Query: INSERT INTO url 
(url,referrer,hops,crc32,next_index_time,status,seed,bad_since_time,si
te_id,server_id,docsize,last_mod_time,shows,pop_rank) VALUES 
('http://www.opennet.ru/',87,2,0,1408286592,0,166,1408286592,0,-76
554,0,0,0,0.0)

indexer[18503]: [18503]{01} sqlite3 driver: (19) UNIQUE constraint 
failed: url.url
indexer[18503]: [18503]{01} Error: 'DB err: sqlite3 driver: (19) 
UNIQUE constraint failed: url.url - '

my config 

DBAddr sqlite3:///home/qq/tmpfs/db1.sqlite/?dbmode=single
SyslogFacility local7
LocalCharset UTF-8
CrossWords yes
MaxDocSize 104857600
URLSelectCacheSize 10240
WordCacheSize 83886080
UseCookie yes
Disallow *.doc
Disallow *.xls
Disallow *.ppt
Disallow *.pdf
Disallow *.b*.sh   *.md5  *.rpm
Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z *.bz2 
Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm *.pcx
Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.wmv
Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
Disallow *.vrml *.wrl  *.png  *.ico  *.psd  *.dat
Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
Disallow *.tex  *.texi *.texinfo
Disallow *.cdf  *.ps
Disallow *.ai   *.eps  *.hqx
Disallow *.cpt  *.bms  *.oda  *.tcl
Disallow *.o*.a*.la   *.so 
Disallow *.pat  *.pm   *.m4   *.am   *.css
Disallow *.map  *.aif  *.sit  *.sea
Disallow *.m3u  *.qt
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
Disallow Regex ://.+/([^/]+)/\1/
Disallow Regex ://.+/([^/]+)/.*/\1/
Disallow Regex [?]([^=]*=).*\1
Disallow Regex %26([^=]*=).*%26\1
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType image/gif   *.gif
AddType text/plain  *.txt  *.pl *.js *.h *.c *.pm 
*.e
AddType text/html   *.html *.htm
AddType text/xml*.xml
AddType message/rfc822  *.eml *.mht *.mhtml
AddType text/rtf*.rtf
AddType application/pdf *.pdf
AddType application/msword  *.doc
AddType application/vnd.ms-excel*.xls
AddType application/vnd.ms-powerpoint   *.ppt
AddType text/x-postscript   *.ps
AddType application/unknown *.*
ParserTimeOut 300
DetectClones yes
Section body1   256
Section title   2   128
Section meta.keywords   3   128
Section meta.description4   128
Section url.file6   0
Section url.path7   0
Section url.host8   0
Section url.proto   9   0
Section crosswords  10  0
Section Charset 11  32
Section Content-Type12  64
Section Content-Language13  16
Section msg.from18  0
Section msg.to  19  0
Section msg.subject 20  0
Section CachedCopy  25 64000
Server  http://www.opennet.ru/

pls help
  

Reply: http://www.mnogosearch.org/board/message.php?id=21654

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Creating custom sections from server headers

2014-08-06 Thread bar

Author: Oliver
Email: 
Message:
Hello,

first, thank you for the great search engine you have created!

For indexing a custom site, I want to create a custom section which should 
contain the filename extracted from the Content-Disposition HTTP header. This 
header is sent for downloadable files and 
its value might look like this:

inline; filename=comments.doc

For this, I've added new sections in the indexer.conf file:

Section header.content-disposition30 128
Section content_filename  31   128 cdoff   ${header.content-disposition} 
^\w+; filename=(.+)$ $1

This indeed adds a header.content-disposition variable which I can use in the 
search.htm file, and which contains the entire Content-Disposition header value.
However, the content_filename section is not created correctly; it is always 
empty.

Through experimenting I found that ${header.content-disposition} is apparently 
not recognized as a variable in the Section command. Is there a way to access 
the Content-Disposition value 
anyway when defining a new section? Also, is there an overview of variables 
available in these Section commands?

As workaround I now use the EREG command in search.htm to extract the filename 
when the results are displayed. However, this is probably less efficient (it's 
done whenever the results are 
displayed, instead of only once during indexing). Also, it adds the entire 
Content-Disposition header to the index, so searching for inline or for 
filename finds all documents which have a Content-
Disposition header - not very desirable.

Can you give me some hints on the variables available in Section commands in 
indexer.conf?

Thanks,
Oliver 

Reply: http://www.mnogosearch.org/board/message.php?id=21653

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Also problem with mysqld_stmt_execute

2014-05-29 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 indexer starts allright, but stops after 0-10 good hits. 
 Incorrect arguments to mysqld_stmt_execute 
 Tried a lot of stuff, but no luck...
 


Which MySQL version are you using?

Perhaps you hit this problem:
http://bugs.mysql.com/bug.php?id=61225

 -
 
 $ /mnogo/sbin/indexer -Eblob 
 indexer[17151]: Converting to blob
 indexer[17151]: Loading URL list
 indexer[17151]: Converting intag00
 indexer[17151]: mysql_stmt_execute() failed: Incorrect arguments to 
 mysqld_stmt_execute
 
 -
 
 $ /mnogo/sbin/indexer -a 
 [some good hits, then]
 indexer[17341]: [17341]{01} mysql_stmt_execute() failed: Incorrect arguments 
 to mysqld_stmt_execute
 indexer[17341]: [17341]{01} Error: 'DB err: mysql_stmt_execute() failed: 
 Incorrect arguments to mysqld_stmt_execute - '
 
 -
 
 $ /mnogo/sbin/indexer
 indexer[17769]: indexer from mnogosearch-3.3.8-mysql-pqsql started with 
 '/mnogo/etc/indexer.conf'
 indexer[17769]: [17769]{01} Done (0 seconds, 0 documents, 0 bytes,  0.00 
 Kbytes/sec.)
 
 -
 
 Ubuntu server 10.04.1 x86_64 
 Mysql 14.12 Distrib 5.0.90 
 Shared server. 

Reply: http://www.mnogosearch.org/board/message.php?id=21647

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Also problem with mysqld_stmt_execute

2014-05-29 Thread bar

Author: Tom Paamand
Email: 
Message:
Thanx for answer. I think it is a very good guess, and have forwarded it to my 
host. My Mysql version is 5.0.90, and is covered in the bug report. I will 
return with more info, if this was the solution!

 Ubuntu server 10.04.1 x86_64
 Mysql 14.12 Distrib 5.0.90
 Shared server. 



  indexer starts allright, but stops after 0-10 good hits. 
  Incorrect arguments to mysqld_stmt_execute 
  Tried a lot of stuff, but no luck...
  
 
 
 Which MySQL version are you using?
 
 Perhaps you hit this problem:
 http://bugs.mysql.com/bug.php?id=61225
 
  -
  
  $ /mnogo/sbin/indexer -Eblob 
  indexer[17151]: Converting to blob
  indexer[17151]: Loading URL list
  indexer[17151]: Converting intag00
  indexer[17151]: mysql_stmt_execute() failed: Incorrect arguments to 
  mysqld_stmt_execute
  
  -
  
  $ /mnogo/sbin/indexer -a 
  [some good hits, then]
  indexer[17341]: [17341]{01} mysql_stmt_execute() failed: Incorrect 
  arguments to mysqld_stmt_execute
  indexer[17341]: [17341]{01} Error: 'DB err: mysql_stmt_execute() failed: 
  Incorrect arguments to mysqld_stmt_execute - '
  
  -
  
  $ /mnogo/sbin/indexer
  indexer[17769]: indexer from mnogosearch-3.3.8-mysql-pqsql started with 
  '/mnogo/etc/indexer.conf'
  indexer[17769]: [17769]{01} Done (0 seconds, 0 documents, 0 bytes,  0.00 
  Kbytes/sec.)
  
  -
  
  Ubuntu server 10.04.1 x86_64 
  Mysql 14.12 Distrib 5.0.90 
  Shared server. 

Reply: http://www.mnogosearch.org/board/message.php?id=21649

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Still problem with mysqld_stmt_execute

2014-05-29 Thread bar

Author: Tom Paamand
Email: 
Message:
The actual MySQL base called is 5.1.61, though system MySQL is 5.0.90, so that 
bug is unfortunately not my answer.  

Problem is still, that indexer starts OK, but stops after 0-10 good hits - with 
this error: 

 indexer[17341]: [17341]{01} mysql_stmt_execute() failed: Incorrect arguments 
to mysqld_stmt_execute
 indexer[17341]: [17341]{01} Error: 'DB err: mysql_stmt_execute() failed: 
Incorrect arguments to mysqld_stmt_execute - '


Mysql Distrib 5.1.61 [updated info] 
Ubuntu Server 10.04.1 x86_64 
on Shared Server. 


  Which MySQL version are you using?
  
  Perhaps you hit this problem:
  http://bugs.mysql.com/bug.php?id=61225


Reply: http://www.mnogosearch.org/board/message.php?id=21650

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Still problem with mysqld_stmt_execute

2014-05-29 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 The actual MySQL base called is 5.1.61, though system MySQL is 5.0.90, so 
 that bug is unfortunately not my answer.  
 
 Problem is still, that indexer starts OK, but stops after 0-10 good hits - 
 with this error: 
 
  indexer[17341]: [17341]{01} mysql_stmt_execute() failed: Incorrect arguments 
 to mysqld_stmt_execute
  indexer[17341]: [17341]{01} Error: 'DB err: mysql_stmt_execute() failed: 
 Incorrect arguments to mysqld_stmt_execute - '
 
 
 Mysql Distrib 5.1.61 [updated info] 
 Ubuntu Server 10.04.1 x86_64 
 on Shared Server. 

Can you please run these two commands and post their results:

indexer --sqlmon --exec=select version();
ldd indexer

Thanks.


Btw, if nothing helps, you can switch off using prepared statements by adding 
ps=none as a parameter to DBAddr, like this:

DBAddr mysql://root@localhost/test/?ps=none



 
 
   Which MySQL version are you using?
   
   Perhaps you hit this problem:
   http://bugs.mysql.com/bug.php?id=61225
 

Reply: http://www.mnogosearch.org/board/message.php?id=21651

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: No problem with mysqld_stmt_execute

2014-05-29 Thread bar

Author: Tom Paamand
Email: 
Message:
ps=none did the trick! 
I have put it like 
/?DBMode=blobps=none 
- and it works now.

I have only changed Server and DBAddr to my own stuff in the conf-dist, so if 
ps=none does not have any bad site effects, I will use this. 

Your first command 
indexer --sqlmon --exec=select version(); 
just outputs the indexer helpfile. I tried different variations, no luck. But 
indexer runs fine from 
mnogo/sbin 
and calls my 
mnogo/etc/indexer.conf

More luck with 
ldd indexer: 
 linux-vdso.so.1 =  (0x7fffe0dff000)
 libpthread.so.0 = /lib/libpthread.so.0 (0x7f7d8984b000)
 librt.so.1 = /lib/librt.so.1 (0x7f7d89643000)
 libmysqlclient.so.15 = /usr/local/lib/mysql/libmysqlclient.so.15 
(0x7f7d892c7000)
 libnsl.so.1 = /lib/libnsl.so.1 (0x7f7d890ad000)
 libm.so.6 = /lib/libm.so.6 (0x7f7d88e2a000)
 libssl.so.0.9.8 = /lib/libssl.so.0.9.8 (0x7f7d88bd5000)
 libcrypto.so.0.9.8 = /lib/libcrypto.so.0.9.8 (0x7f7d88844000)
 libz.so.1 = /lib/libz.so.1 (0x7f7d8862d000)
 libpq.so.5 = /usr/lib/libpq.so.5 (0x7f7d88403000)
 libcrypt.so.1 = /lib/libcrypt.so.1 (0x7f7d881ca000)
 libc.so.6 = /lib/libc.so.6 (0x7f7d87e43000)
 /lib64/ld-linux-x86-64.so.2 (0x7f7d89a89000)
 libdl.so.2 = /lib/libdl.so.2 (0x7f7d87c3e000)
 libkrb5.so.3 = /usr/lib/libkrb5.so.3 (0x7f7d8797a000)
 libcom_err.so.2 = /lib/libcom_err.so.2 (0x7f7d87776000)
 libgssapi_krb5.so.2 = /usr/lib/libgssapi_krb5.so.2 (0x7f7d87541000)
 libldap_r-2.4.so.2 = /usr/lib/libldap_r-2.4.so.2 (0x7f7d872f5000)
 libk5crypto.so.3 = /usr/lib/libk5crypto.so.3 (0x7f7d870cf000)
 libkrb5support.so.0 = /usr/lib/libkrb5support.so.0 (0x7f7d86ec6000)
 libkeyutils.so.1 = /lib/libkeyutils.so.1 (0x7f7d86cc3000)
 libresolv.so.2 = /lib/libresolv.so.2 (0x7f7d86aaa000)
 liblber-2.4.so.2 = /usr/lib/liblber-2.4.so.2 (0x7f7d8689b000)
 libsasl2.so.2 = /usr/lib/libsasl2.so.2 (0x7f7d86681000)
 libgnutls.so.26 = /usr/lib/libgnutls.so.26 (0x7f7d863df000)
 libtasn1.so.3 = /usr/lib/libtasn1.so.3 (0x7f7d861cd000)
 libgcrypt.so.11 = /lib/libgcrypt.so.11 (0x7f7d85f55000)
 libgpg-error.so.0 = /lib/libgpg-error.so.0 (0x7f7d85d51000)

- but problem looks like it is solved, thanks!

Reply: http://www.mnogosearch.org/board/message.php?id=21652

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Also problem with mysqld_stmt_execute

2014-05-28 Thread bar

Author: Tom Paamand
Email: 
Message:
indexer starts allright, but stops after 0-10 good hits. 
Incorrect arguments to mysqld_stmt_execute 
Tried a lot of stuff, but no luck...

-

$ /mnogo/sbin/indexer -Eblob 
indexer[17151]: Converting to blob
indexer[17151]: Loading URL list
indexer[17151]: Converting intag00
indexer[17151]: mysql_stmt_execute() failed: Incorrect arguments to 
mysqld_stmt_execute

-

$ /mnogo/sbin/indexer -a 
[some good hits, then]
indexer[17341]: [17341]{01} mysql_stmt_execute() failed: Incorrect arguments to 
mysqld_stmt_execute
indexer[17341]: [17341]{01} Error: 'DB err: mysql_stmt_execute() failed: 
Incorrect arguments to mysqld_stmt_execute - '

-

$ /mnogo/sbin/indexer
indexer[17769]: indexer from mnogosearch-3.3.8-mysql-pqsql started with 
'/mnogo/etc/indexer.conf'
indexer[17769]: [17769]{01} Done (0 seconds, 0 documents, 0 bytes,  0.00 
Kbytes/sec.)

-

Ubuntu server 10.04.1 x86_64 
Mysql 14.12 Distrib 5.0.90 
Shared server. 

Reply: http://www.mnogosearch.org/board/message.php?id=21646

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: mnoGoSearch 3.3 - 3.4 MySQL issue

2014-05-13 Thread bar

Author: Laurent
Email: 
Message:
Hi Guys,

I moved my platform and recompiling mnoGoSearch also moved me from 
3.3.15 to 3.4.0

When I launch, I got a MySQL erreur because of new/different fields 
(urlinfob for example).

Is there a migration process to keep all the indexed data ?

Thanks

Brgrds

Reply: http://www.mnogosearch.org/board/message.php?id=21644

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: 1111111

2014-05-04 Thread bar

Author: 
Email: 111
Message:


Reply: http://www.mnogosearch.org/board/message.php?id=21641

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Error: 'DB err: mysql_stmt_execute() failed:

2014-04-30 Thread bar

Author: mamadoo06
Email: fohoi...@gmail.com
Message:
Hi,

When I start indexer, it stops immediately, displaying :
indexer[2767] Error: [2767]{01} mysql_stmt_execute() failed: Incorrect string 
value: 
'\xE9gorie...' for column 'sval' at row 1
indexer[2767] Error: [2767]{01} Error: 'DB err: mysql_stmt_execute() failed: 
Incorrect string 
value: '\xE9gorie...' for column 'sval' at row 1 - '

Do you know what's happening ?

Mac OS X

Thanks !

Reply: http://www.mnogosearch.org/board/message.php?id=21639

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: Error: 'DB err: mysql_stmt_execute() failed:

2014-04-30 Thread bar

Author: mamadoo06
Email: fohoi...@gmail.com
Message:
Ok that was encoding error

Reply: http://www.mnogosearch.org/board/message.php?id=21640

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

1 2 3 >

1 - 100 of 203 matches

Mail list logo