Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-24 Thread Pavel Stehule
2010/3/24 Craig Ringer cr...@postnewspapers.com.au:
 Pavel Stehule wrote:

 Personally I dislike idea some dictionary precompiler - it is next
 application for maintaining and maybe not necessary.

 That's the sort of thing that can be done when first required by any
 backend and the results saved in a file for other backends to mmap().
 It'd probably want to be opened r/w access-exclusive initially, then
 re-opened read-only access-shared when ready for use.

 My only concern would be that the cache would want to be forcibly
 cleared at postmaster start, so that restart the postmaster fixes any
 messsed-up-cache issues that might arise (not that they should) without
 people having to go rm'ing in the datadir. Even if Pg never has any bugs
 that result in bad cache files, the file system / bad memory / cosmic
 rays / etc can still mangle a cache file.

 BTW, mmap() isn't an issue on Windows:
  http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
 It's spelled CreateFileMapping, but otherwise is fairly similar, and is
 perfect for this sort of use.

 A shared read-only mapping of processed-and-cached tsearch2 dictionaries
 would save a HUGE amount of memory if many backends were using tsearch2
 at the same time. I'd make a big difference here.


If you know this area well, please, enhance my first patch. I am not
able to oppose to Tom, who has a clean opinion on this patch :(

Pavel


 --
 Craig Ringer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-24 Thread Bruce Momjian
Pavel Stehule wrote:
 2010/3/24 Craig Ringer cr...@postnewspapers.com.au:
  Pavel Stehule wrote:
 
  Personally I dislike idea some dictionary precompiler - it is next
  application for maintaining and maybe not necessary.
 
  That's the sort of thing that can be done when first required by any
  backend and the results saved in a file for other backends to mmap().
  It'd probably want to be opened r/w access-exclusive initially, then
  re-opened read-only access-shared when ready for use.
 
  My only concern would be that the cache would want to be forcibly
  cleared at postmaster start, so that restart the postmaster fixes any
  messsed-up-cache issues that might arise (not that they should) without
  people having to go rm'ing in the datadir. Even if Pg never has any bugs
  that result in bad cache files, the file system / bad memory / cosmic
  rays / etc can still mangle a cache file.
 
  BTW, mmap() isn't an issue on Windows:
  ?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
  It's spelled CreateFileMapping, but otherwise is fairly similar, and is
  perfect for this sort of use.
 
  A shared read-only mapping of processed-and-cached tsearch2 dictionaries
  would save a HUGE amount of memory if many backends were using tsearch2
  at the same time. I'd make a big difference here.
 
 
 If you know this area well, please, enhance my first patch. I am not
 able to oppose to Tom, who has a clean opinion on this patch :(

Should we add a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-24 Thread Pavel Stehule
2010/3/24 Bruce Momjian br...@momjian.us:
 Pavel Stehule wrote:
 2010/3/24 Craig Ringer cr...@postnewspapers.com.au:
  Pavel Stehule wrote:
 
  Personally I dislike idea some dictionary precompiler - it is next
  application for maintaining and maybe not necessary.
 
  That's the sort of thing that can be done when first required by any
  backend and the results saved in a file for other backends to mmap().
  It'd probably want to be opened r/w access-exclusive initially, then
  re-opened read-only access-shared when ready for use.
 
  My only concern would be that the cache would want to be forcibly
  cleared at postmaster start, so that restart the postmaster fixes any
  messsed-up-cache issues that might arise (not that they should) without
  people having to go rm'ing in the datadir. Even if Pg never has any bugs
  that result in bad cache files, the file system / bad memory / cosmic
  rays / etc can still mangle a cache file.
 
  BTW, mmap() isn't an issue on Windows:
  ?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
  It's spelled CreateFileMapping, but otherwise is fairly similar, and is
  perfect for this sort of use.
 
  A shared read-only mapping of processed-and-cached tsearch2 dictionaries
  would save a HUGE amount of memory if many backends were using tsearch2
  at the same time. I'd make a big difference here.
 

 If you know this area well, please, enhance my first patch. I am not
 able to oppose to Tom, who has a clean opinion on this patch :(

 Should we add a TODO?

why not ?

Pavel

 --
  Bruce Momjian  br...@momjian.us        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-24 Thread Bruce Momjian
Pavel Stehule wrote:
 2010/3/24 Bruce Momjian br...@momjian.us:
  Pavel Stehule wrote:
  2010/3/24 Craig Ringer cr...@postnewspapers.com.au:
   Pavel Stehule wrote:
  
   Personally I dislike idea some dictionary precompiler - it is next
   application for maintaining and maybe not necessary.
  
   That's the sort of thing that can be done when first required by any
   backend and the results saved in a file for other backends to mmap().
   It'd probably want to be opened r/w access-exclusive initially, then
   re-opened read-only access-shared when ready for use.
  
   My only concern would be that the cache would want to be forcibly
   cleared at postmaster start, so that restart the postmaster fixes any
   messsed-up-cache issues that might arise (not that they should) without
   people having to go rm'ing in the datadir. Even if Pg never has any bugs
   that result in bad cache files, the file system / bad memory / cosmic
   rays / etc can still mangle a cache file.
  
   BTW, mmap() isn't an issue on Windows:
   ?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
   It's spelled CreateFileMapping, but otherwise is fairly similar, and is
   perfect for this sort of use.
  
   A shared read-only mapping of processed-and-cached tsearch2 dictionaries
   would save a HUGE amount of memory if many backends were using tsearch2
   at the same time. I'd make a big difference here.
  
 
  If you know this area well, please, enhance my first patch. I am not
  able to oppose to Tom, who has a clean opinion on this patch :(
 
  Should we add a TODO?
 
 why not ?

OK, what would the TODO text be?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-24 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 OK, what would the TODO text be?

I think there are really two tasks here:

* preprocess the textual dictionary definition files into something
that can be slurped directly into memory;

* use mmap() instead of read() to read preprocessed files into memory,
on machines where such a syscall is available.

There would be considerable gain from task #1 even without mmap.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-23 Thread Heikki Linnakangas
Takahiro Itagaki wrote:
 Pavel Stehule pavel.steh...@gmail.com wrote:
 
 I wrote some small patch, that allow preloading of  selected ispell
 dictionary. It solve the problem with slow tsearch initialisation with
 some language configuration.

 I afraid so this module doesn't help on MS Windows.
 
 I think it should work on all platforms if we include it into the core.

It will work, as in it will compile and run. It just won't be any
faster. I think that's enough, otherwise you could argue that we
shouldn't have preload_shared_libraries option at all because it won't
help on Windows.

 The fundamental issue seems to be in the slow initialization of
 dictionaries. If so, how about adding a pre-complile tool to convert
 a dictionary into a binary file, and each backend simply mmap it?

Yeah, that would be better.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-23 Thread Pavel Stehule
2010/3/23 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 Pavel Stehule pavel.steh...@gmail.com wrote:

 I wrote some small patch, that allow preloading of  selected ispell
 dictionary. It solve the problem with slow tsearch initialisation with
 some language configuration.

 I afraid so this module doesn't help on MS Windows.

 I think it should work on all platforms if we include it into the core.
 We should continue to research shared memory or mmap approaches.

 The fundamental issue seems to be in the slow initialization of
 dictionaries. If so, how about adding a pre-complile tool to convert
 a dictionary into a binary file, and each backend simply mmap it?

It means loading about 25MB from disc. for every first tsearch query -
sorry, I don't believe can be good.


 BTW, SimpleAllocContextCreate() is not used at all in the patch.
 Do you still need it?


yes - I needed it. Without Simple Allocator cz configuration takes
48MB. There are a few parts has to be supported by Simple Allocator -
other hasn't significant impact - so I don't ugly more code. In my
first path I verify so dictionary data are read only so I was
motivated to use Simple Allocator everywhere. It is not necessary for
preload method.

Pavel

 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-23 Thread Nicolas Barbier
2010/3/23 Pavel Stehule pavel.steh...@gmail.com:

 2010/3/23 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 The fundamental issue seems to be in the slow initialization of
 dictionaries. If so, how about adding a pre-complile tool to convert
 a dictionary into a binary file, and each backend simply mmap it?

 It means loading about 25MB from disc. for every first tsearch query -
 sorry, I don't believe can be good.

The operating system's VM subsystem should make that a non-problem.
Loading is also not the word I would use to indicate what mmap does.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-23 Thread Pavel Stehule
2010/3/23 Nicolas Barbier nicolas.barb...@gmail.com:
 2010/3/23 Pavel Stehule pavel.steh...@gmail.com:

 2010/3/23 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 The fundamental issue seems to be in the slow initialization of
 dictionaries. If so, how about adding a pre-complile tool to convert
 a dictionary into a binary file, and each backend simply mmap it?

 It means loading about 25MB from disc. for every first tsearch query -
 sorry, I don't believe can be good.

 The operating system's VM subsystem should make that a non-problem.
 Loading is also not the word I would use to indicate what mmap does.

Maybe we can do some manipulation inside memory - I have not any
knowledges about mmap. With Simple Allocator we can have a dictionary
data as one block. Problems are a pointers, but I believe so can be
replaced by offsets.

Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary. And still you
need a next application for loading.

p.s. I able to serialise czech dictionary, because it use only simply regexp.

Pavel



 Nicolas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-23 Thread Craig Ringer
Pavel Stehule wrote:

 Personally I dislike idea some dictionary precompiler - it is next
 application for maintaining and maybe not necessary.

That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.

My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that restart the postmaster fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.

BTW, mmap() isn't an issue on Windows:
  http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.

A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.

--
Craig Ringer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: preloading of ispell dictionary

2010-03-22 Thread Takahiro Itagaki

Pavel Stehule pavel.steh...@gmail.com wrote:

 I wrote some small patch, that allow preloading of  selected ispell
 dictionary. It solve the problem with slow tsearch initialisation with
 some language configuration.
 
 I afraid so this module doesn't help on MS Windows.

I think it should work on all platforms if we include it into the core.
We should continue to research shared memory or mmap approaches.

The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?

BTW, SimpleAllocContextCreate() is not used at all in the patch.
Do you still need it?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers