Re: [GENERAL] index question

2016-05-01 Thread Melvin Davidson
On Sun, May 1, 2016 at 10:27 PM, drum.lu...@gmail.com 
wrote:

> Sorry @Melvin, sent the previous email just to you..
>
>
> That's a great one, too! Cheers!
>
>
> Well.. the index creation did not help...
>
> if possible please have a look on the explain analyze results:
>
> http://explain.depesz.com/s/rHOU
>
> What else can I do?
>
> *The indexes I created is:*
> - CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
> gorfs.inode_segments USING btree ("full_path");
>
> - CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes
> USING btree ("checksum_md5","st_size");
>

Two things to consider.
1. Did you remember to run ANALYZE on the table after the new indexes were
created?
2. Try doing a
SET enable_seqscan = off;
before executing the query  and compare execution times.
It might just be that a seqscan would be faster.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] index question

2016-05-01 Thread drum.lu...@gmail.com
Sorry @Melvin, sent the previous email just to you..


That's a great one, too! Cheers!


Well.. the index creation did not help...

if possible please have a look on the explain analyze results:

http://explain.depesz.com/s/rHOU

What else can I do?

*The indexes I created is:*
- CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
gorfs.inode_segments USING btree ("full_path");

- CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes
USING btree ("checksum_md5","st_size");


Re: [GENERAL] index question

2016-05-01 Thread Melvin Davidson
On Sun, May 1, 2016 at 9:18 PM, drum.lu...@gmail.com 
wrote:

> To clarify, the index is based on a function called "split_part()
>> The WHERE clause is only referencing the full_part column, so the planner
>> cannot associate the index with the full_part column.
>>
>
> Thanks for the explanation, Melvin.
>
> It would be simple like:
>
> CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree
> ("full_path");
>
> ?
>
> Thanks again.
> Lucas
>

>CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree
("full_path");

Yes, that should work.
A word of caution, only create additional indexes that will actually be
used in queries.

You can check how often indexes are used (and status) with:

SELECT n.nspname as schema,
   i.relname as table,
   i.indexrelname as index,
   i.idx_scan,
   i.idx_tup_read,
   i.idx_tup_fetch,
   CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
   pg_get_indexdef(idx.indexrelid),
   CASE WHEN idx.indisvalid
THEN 'valid'
ELSE 'INVALID'
END as statusi,
   pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname)) as size_in_bytes,
   pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname))) as size
  FROM pg_stat_all_indexes i
  JOIN pg_class c ON (c.oid = i.relid)
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
  JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
 WHERE n.nspname NOT LIKE 'pg_%'
ORDER BY 1, 2, 3;

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] index question

2016-05-01 Thread drum.lu...@gmail.com
>
> To clarify, the index is based on a function called "split_part()
> The WHERE clause is only referencing the full_part column, so the planner
> cannot associate the index with the full_part column.
>

Thanks for the explanation, Melvin.

It would be simple like:

CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree ("full_path");

?

Thanks again.
Lucas


Re: [GENERAL] index question

2016-05-01 Thread Melvin Davidson
On Sun, May 1, 2016 at 6:31 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sunday, May 1, 2016, Melvin Davidson  wrote:
>
>>
>> Your index is based on split_part function
>> but the WHERE clause is specific to full_path, so the planner cannot find
>> a valid index
>>
>>
>
>
> David J.
>

>This sentence is even less useful than the questions that you asked...
Your comments are antagonistic at best, but perhaps you should take a
course in english.

To clarify, the index is based on a function called "split_part()
The WHERE clause is only referencing the full_part column, so the planner
cannot associate the index with the full_part column.


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] index question

2016-05-01 Thread Melvin Davidson
On Sun, May 1, 2016 at 5:58 PM, drum.lu...@gmail.com 
wrote:

>
>>
>> Well, a little more information would be useful like:
>>
>
> Ops.. yes sure.. sorry about that.
>
>
>> 1. What is the PostgreSQL version?
>>
>
> PostgreSQL 9.2
>
>
>> 2. What is the O/S?
>>
>
> Linux Centos 6.7 64 bits
>
>
>> 3. What is the structure of gorfs.inode_segments?
>>
>
> Table inode_segments: (I'll leave the comments to help)
>
>> CREATE TABLE gorfs.inode_segments
>> (
>>   st_ino "gorfs"."ino_t" NOT NULL, -- Inode number the segment belongs
>> to. alongside segment_index, it forms the table's primary key to ensure
>> uniqueness per relevant scope
>>   segment_index "gorfs"."pathname_component" NOT NULL, -- See st_no's
>> column description for further details. The meaning of this column varies
>> based on the host inode type:...
>>   st_ino_target "gorfs"."ino_t", -- Target inode number. Meaningful for
>> directory inode segments (objects in the directory)
>>   full_path "gorfs"."absolute_pathname", -- Exploded absolute canonical
>> path for quick lookups. Meaningful only for directory inode segments
>> (objects in the directory)
>>   segment_data "bytea", -- Actual data segment. Meaningful only for
>> S_IFLNK and S_IFREG
>>   CONSTRAINT pk_inode_segments PRIMARY KEY ("st_ino", "segment_index"),
>>   CONSTRAINT fk_host_inode_must_exist FOREIGN KEY (st_ino)
>>   REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
>>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>>   CONSTRAINT fk_target_inode_must_exist FOREIGN KEY (st_ino_target)
>>   REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
>>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>>   CONSTRAINT uc_no_duplicate_full_paths UNIQUE ("full_path"),
>>   CONSTRAINT cc_only_root_can_be_its_own_parent CHECK ("st_ino_target" IS
>> NULL OR "st_ino"::bigint <> "st_ino_target"::bigint OR "st_ino"::bigint = 2)
>> )
>
>
> Table gorfs.inodes:
>
>> CREATE TABLE gorfs.inodes
>> (
>>   st_dev "gorfs"."dev_t" DEFAULT NULL::bigint, -- ID of device containing
>> file. Meaningless in this implementation
>>   st_ino "gorfs"."ino_t" NOT NULL DEFAULT
>> "nextval"('"gorfs"."inodes_st_ino_idseq"'::"regclass"), -- Inode number
>>   st_mode "gorfs"."mode_t" NOT NULL, -- File type/mode bits
>>   st_nlink "gorfs"."nlink_t" NOT NULL, -- Number of hard links (directory
>> segments) pointing to this inode. See stat(2) manual page for details (man
>> 2 stat)
>>   st_uid "gorfs"."uid_t" NOT NULL, -- User ID that owns the file. See
>> stat(2) manual page for details (man 2 stat)
>>   st_gid "gorfs"."gid_t" NOT NULL, -- Group ID that owns the file.See
>> stat(2) manual page for details (man 2 stat)
>>   st_rdev "gorfs"."dev_t", -- Device number (currently we don't support
>> device files).  See stat(2) manual page for details (man 2 stat)
>>   st_size "gorfs"."off_t", -- File size, if applicable. See stat(2)
>> manual page for details (man 2 stat)
>>   st_blksize "gorfs"."blksize_t", -- Block size for I/O. Meaningless
>> here, hard coded to 512. See stat(2) manual page for details (man 2 stat)
>>   st_blocks "gorfs"."blkcnt_t", -- Number of allocated blocks.
>> Meaningless here, but calculated from block size. See stat(2) manual page
>> for details (man 2 stat)
>>   st_atime "gorfs"."time_t" NOT NULL, -- Timestamp of last access. Stored
>> as a timestamp as opposed to unix TS. See stat(2) manual page for details
>> (man 2 stat)
>>   st_mtime "gorfs"."time_t" NOT NULL, -- Timestamp of last modification.
>> Stored as a timestamp as opposed to unix TS. See stat(2) manual page for
>> details (man 2 stat)
>>   st_ctime "gorfs"."time_t" NOT NULL, -- Timestamp of last change. Stored
>> as a timestamp as opposed to unix TS. See stat(2) manual page for details
>> (man 2 stat)
>>   checksum_md5 "md5_hash", -- MD5 checksum of the file. Supplied by the
>> application as the DB might not even see the payload
>>   media_subtype_id integer, -- Reference to MIME type (see FK
>> constraint). We can't support all media types but unknow types can be
>> stored as application/octet-stream
>>   external_size "gorfs"."off_t", -- For symlinks only. Meaningful for fat
>> links only: total size of the fat link target. Null for normal symlinks
>>   CONSTRAINT pk_inodes PRIMARY KEY ("st_ino"),
>>   CONSTRAINT fk_media_subtype_must_exist FOREIGN KEY (media_subtype_id)
>>   REFERENCES public.media_subtypes (media_subtype_id) MATCH SIMPLE
>>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>>   CONSTRAINT cc_mount_devices_not_supported CHECK ("st_dev" IS NULL)
>> )
>
>
>
> 4. Did you do an ANALYZE table gorfs.inode_segments   after you created
>> the index?
>>
>
> Yes.. actually the index was already created.
>

Well, it looks like David's explanation is correct.
Your index is based on split_part function
but the WHERE clause is specific to full_path, so the planner cannot find a
valid index

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is 

Re: [GENERAL] index question

2016-05-01 Thread drum.lu...@gmail.com
>
>
>
> Well, a little more information would be useful like:
>

Ops.. yes sure.. sorry about that.


> 1. What is the PostgreSQL version?
>

PostgreSQL 9.2


> 2. What is the O/S?
>

Linux Centos 6.7 64 bits


> 3. What is the structure of gorfs.inode_segments?
>

Table inode_segments: (I'll leave the comments to help)

> CREATE TABLE gorfs.inode_segments
> (
>   st_ino "gorfs"."ino_t" NOT NULL, -- Inode number the segment belongs to.
> alongside segment_index, it forms the table's primary key to ensure
> uniqueness per relevant scope
>   segment_index "gorfs"."pathname_component" NOT NULL, -- See st_no's
> column description for further details. The meaning of this column varies
> based on the host inode type:...
>   st_ino_target "gorfs"."ino_t", -- Target inode number. Meaningful for
> directory inode segments (objects in the directory)
>   full_path "gorfs"."absolute_pathname", -- Exploded absolute canonical
> path for quick lookups. Meaningful only for directory inode segments
> (objects in the directory)
>   segment_data "bytea", -- Actual data segment. Meaningful only for
> S_IFLNK and S_IFREG
>   CONSTRAINT pk_inode_segments PRIMARY KEY ("st_ino", "segment_index"),
>   CONSTRAINT fk_host_inode_must_exist FOREIGN KEY (st_ino)
>   REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT fk_target_inode_must_exist FOREIGN KEY (st_ino_target)
>   REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT uc_no_duplicate_full_paths UNIQUE ("full_path"),
>   CONSTRAINT cc_only_root_can_be_its_own_parent CHECK ("st_ino_target" IS
> NULL OR "st_ino"::bigint <> "st_ino_target"::bigint OR "st_ino"::bigint = 2)
> )


Table gorfs.inodes:

> CREATE TABLE gorfs.inodes
> (
>   st_dev "gorfs"."dev_t" DEFAULT NULL::bigint, -- ID of device containing
> file. Meaningless in this implementation
>   st_ino "gorfs"."ino_t" NOT NULL DEFAULT
> "nextval"('"gorfs"."inodes_st_ino_idseq"'::"regclass"), -- Inode number
>   st_mode "gorfs"."mode_t" NOT NULL, -- File type/mode bits
>   st_nlink "gorfs"."nlink_t" NOT NULL, -- Number of hard links (directory
> segments) pointing to this inode. See stat(2) manual page for details (man
> 2 stat)
>   st_uid "gorfs"."uid_t" NOT NULL, -- User ID that owns the file. See
> stat(2) manual page for details (man 2 stat)
>   st_gid "gorfs"."gid_t" NOT NULL, -- Group ID that owns the file.See
> stat(2) manual page for details (man 2 stat)
>   st_rdev "gorfs"."dev_t", -- Device number (currently we don't support
> device files).  See stat(2) manual page for details (man 2 stat)
>   st_size "gorfs"."off_t", -- File size, if applicable. See stat(2) manual
> page for details (man 2 stat)
>   st_blksize "gorfs"."blksize_t", -- Block size for I/O. Meaningless here,
> hard coded to 512. See stat(2) manual page for details (man 2 stat)
>   st_blocks "gorfs"."blkcnt_t", -- Number of allocated blocks. Meaningless
> here, but calculated from block size. See stat(2) manual page for details
> (man 2 stat)
>   st_atime "gorfs"."time_t" NOT NULL, -- Timestamp of last access. Stored
> as a timestamp as opposed to unix TS. See stat(2) manual page for details
> (man 2 stat)
>   st_mtime "gorfs"."time_t" NOT NULL, -- Timestamp of last modification.
> Stored as a timestamp as opposed to unix TS. See stat(2) manual page for
> details (man 2 stat)
>   st_ctime "gorfs"."time_t" NOT NULL, -- Timestamp of last change. Stored
> as a timestamp as opposed to unix TS. See stat(2) manual page for details
> (man 2 stat)
>   checksum_md5 "md5_hash", -- MD5 checksum of the file. Supplied by the
> application as the DB might not even see the payload
>   media_subtype_id integer, -- Reference to MIME type (see FK constraint).
> We can't support all media types but unknow types can be stored as
> application/octet-stream
>   external_size "gorfs"."off_t", -- For symlinks only. Meaningful for fat
> links only: total size of the fat link target. Null for normal symlinks
>   CONSTRAINT pk_inodes PRIMARY KEY ("st_ino"),
>   CONSTRAINT fk_media_subtype_must_exist FOREIGN KEY (media_subtype_id)
>   REFERENCES public.media_subtypes (media_subtype_id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT cc_mount_devices_not_supported CHECK ("st_dev" IS NULL)
> )



4. Did you do an ANALYZE table gorfs.inode_segments   after you created the
> index?
>

Yes.. actually the index was already created.


Re: [GENERAL] index question

2016-05-01 Thread David G. Johnston
On Sunday, May 1, 2016, drum.lu...@gmail.com  wrote:

> Hi all,
>
> I've got the following index on the gorfs.inode_segments table:
>
>>
>> CREATE INDEX ix_clientids
>>   ON gorfs.inode_segments
>>   USING btree
>>   (("split_part"("full_path"::"text", '/'::"text", 4)::integer))
>>   WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text");
>
>
> And I'm running the following Query:
>
>> SELECT
>> * FROM ( SELECT split_part(full_path, '/', 4)::INT AS account_id,
>>split_part(full_path, '/', 6)::INT AS note_id,
>>split_part(full_path, '/', 9)::TEXT AS variation,
>>st_size,
>>segment_index,
>>reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>>i.st_ino,
>>full_path
>> FROM gorfs.inodes i
>> JOIN gorfs.inode_segments s
>>   ON i.st_ino = s.st_ino_target
>> WHERE i.checksum_md5 IS NOT NULL
>>   AND s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
>>   AND i.st_size > 0) as test WHERE account_id = 12225
>
>
> *- But the query does not use the index... Why?*
>

The most obvious reason is that the index is partial but the query doesn't
contain an appropriate where clause.

I'm also not sure how well the planner can move around the functional
expression in the select-list so that it matches up in the where clause to
then match the index.


>
> Explain analyze:
>
>> "Seq Scan on "inode_segments"  (cost=0.00..3047212.44 rows=524846
>> width=63) (actual time=14212.466..51428.439 rows=31 loops=1)"
>> "  Filter: ("split_part"(("full_path")::"text", '/'::"text", 4) =
>> '12225'::"text")"
>> "  Rows Removed by Filter: 104361402"
>> "Total runtime: 51428.482 ms"
>
>
>
These stats seem wacky...and seem to be missing stuff like the inodes
table...

David J.


Re: [GENERAL] index question

2016-05-01 Thread Melvin Davidson
On Sun, May 1, 2016 at 5:40 PM, drum.lu...@gmail.com 
wrote:

> Hi all,
>
> I've got the following index on the gorfs.inode_segments table:
>
>>
>> CREATE INDEX ix_clientids
>>   ON gorfs.inode_segments
>>   USING btree
>>   (("split_part"("full_path"::"text", '/'::"text", 4)::integer))
>>   WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text");
>
>
> And I'm running the following Query:
>
>> SELECT
>> * FROM ( SELECT split_part(full_path, '/', 4)::INT AS account_id,
>>split_part(full_path, '/', 6)::INT AS note_id,
>>split_part(full_path, '/', 9)::TEXT AS variation,
>>st_size,
>>segment_index,
>>reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>>i.st_ino,
>>full_path
>> FROM gorfs.inodes i
>> JOIN gorfs.inode_segments s
>>   ON i.st_ino = s.st_ino_target
>> WHERE i.checksum_md5 IS NOT NULL
>>   AND s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
>>   AND i.st_size > 0) as test WHERE account_id = 12225
>
>
> *- But the query does not use the index... Why?*
>
> Explain analyze:
>
>> "Seq Scan on "inode_segments"  (cost=0.00..3047212.44 rows=524846
>> width=63) (actual time=14212.466..51428.439 rows=31 loops=1)"
>> "  Filter: ("split_part"(("full_path")::"text", '/'::"text", 4) =
>> '12225'::"text")"
>> "  Rows Removed by Filter: 104361402"
>> "Total runtime: 51428.482 ms"
>
>
> Cheers
> Lucas
>
>
Well, a little more information would be useful like:
1. What is the PostgreSQL version?
2. What is the O/S?
3. What is the structure of gorfs.inode_segments?
4. Did you do an ANALYZE table gorfs.inode_segments   after you created the
index?

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] index question

2016-05-01 Thread drum.lu...@gmail.com
Hi all,

I've got the following index on the gorfs.inode_segments table:

>
> CREATE INDEX ix_clientids
>   ON gorfs.inode_segments
>   USING btree
>   (("split_part"("full_path"::"text", '/'::"text", 4)::integer))
>   WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text");


And I'm running the following Query:

> SELECT
> * FROM ( SELECT split_part(full_path, '/', 4)::INT AS account_id,
>split_part(full_path, '/', 6)::INT AS note_id,
>split_part(full_path, '/', 9)::TEXT AS variation,
>st_size,
>segment_index,
>reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>i.st_ino,
>full_path
> FROM gorfs.inodes i
> JOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_target
> WHERE i.checksum_md5 IS NOT NULL
>   AND s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
>   AND i.st_size > 0) as test WHERE account_id = 12225


*- But the query does not use the index... Why?*

Explain analyze:

> "Seq Scan on "inode_segments"  (cost=0.00..3047212.44 rows=524846
> width=63) (actual time=14212.466..51428.439 rows=31 loops=1)"
> "  Filter: ("split_part"(("full_path")::"text", '/'::"text", 4) =
> '12225'::"text")"
> "  Rows Removed by Filter: 104361402"
> "Total runtime: 51428.482 ms"


Cheers
Lucas


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-05-01 Thread Tom Smith
No, it is within the individual json object storage. In a way, it would be
part of query plan,
but strictly for the individual json object storage structure, it is not
necessarily an "index"
one possible(but primitive) implementation could be like having multiple
"segments" in the storage,
all keys starting with "a"  is in first segment, etc.

On Sun, May 1, 2016 at 4:14 PM, Oleg Bartunov  wrote:

>
>
> On Sun, May 1, 2016 at 6:46 AM, Tom Smith 
> wrote:
>
>> Hello:
>>
>> I'd like to bring this JSONB performance issue again.
>> Below is a link of MySQL way of storing/retrieving Json key/value
>>
>> https://dev.mysql.com/doc/refman/5.7/en/json.html
>>
>> Instead of providing column indexing(like GIN for JSONB in Postgresql).
>> it provides only internal data structure level indexing within each
>> individual json object
>> for fast retrieval.  compression is not used.
>>
>> Perhaps without implementing  complicated column level GIN indexing,
>> implementing
>> a new variant JSON type that only handle  individual json object indexing
>> would be
>> feasible?  Combined with current JSONB implementation,   both common use
>> cases
>> (one is global doc indexing, the other is fast retrieval of individual
>> values)
>> would work out and make postgresql unbeatable.
>>
>
> It's called expression index ?
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian  wrote:
>>
>>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>>> > Hi,
>>> >
>>> > Congrats on the official release of 9.5
>>> >
>>> > And I'd like bring up the issue again about if 9.6 would address the
>>> jsonb
>>> > performance issue
>>> > with large number of top level keys.
>>> > It is true that it does not have to use JSON format. it is about
>>> serialization
>>> > and fast retrieval
>>> > of dynamic tree structure objects. (at top level, it might be called
>>> dynamic
>>> > columns)
>>> > So if postgresql can have its own way, that would work out too as long
>>> as it
>>> > can have intuitive query
>>> > (like what are implemented for json and jsonb) and fast retrieval of a
>>> tree
>>> > like object,
>>> > it can be called no-sql data type. After all, most motivations of
>>> using no-sql
>>> > dbs like MongoDB
>>> > is about working with dynamic tree object.
>>> >
>>> > If postgresql can have high performance on this, then many no-sql dbs
>>> would
>>> > become history.
>>>
>>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>>> way to store, in a data-type-agnostic way, long strings compressed and
>>> any other long data type, e.g. long arrays.
>>>
>>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>>> unique case because it is one of the few types that can be processed
>>> without reading the entire value, e.g. it has an index.
>>>
>>> We are going to be hesitant to do something data-type-specific for
>>> JSONB.  It would be good if we could develop a data-type-agnostic
>>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>>> and it is unlikely it will be done in time for 9.6.
>>>
>>> --
>>>   Bruce Momjian  http://momjian.us
>>>   EnterpriseDB http://enterprisedb.com
>>>
>>> + As you are, so once was I. As I am, so you will be. +
>>> + Roman grave inscription +
>>>
>>
>>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-05-01 Thread Oleg Bartunov
On Sun, May 1, 2016 at 6:46 AM, Tom Smith  wrote:

> Hello:
>
> I'd like to bring this JSONB performance issue again.
> Below is a link of MySQL way of storing/retrieving Json key/value
>
> https://dev.mysql.com/doc/refman/5.7/en/json.html
>
> Instead of providing column indexing(like GIN for JSONB in Postgresql).
> it provides only internal data structure level indexing within each
> individual json object
> for fast retrieval.  compression is not used.
>
> Perhaps without implementing  complicated column level GIN indexing,
> implementing
> a new variant JSON type that only handle  individual json object indexing
> would be
> feasible?  Combined with current JSONB implementation,   both common use
> cases
> (one is global doc indexing, the other is fast retrieval of individual
> values)
> would work out and make postgresql unbeatable.
>

It's called expression index ?


>
>
>
>
>
>
>
>
>
> On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian  wrote:
>
>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>> > Hi,
>> >
>> > Congrats on the official release of 9.5
>> >
>> > And I'd like bring up the issue again about if 9.6 would address the
>> jsonb
>> > performance issue
>> > with large number of top level keys.
>> > It is true that it does not have to use JSON format. it is about
>> serialization
>> > and fast retrieval
>> > of dynamic tree structure objects. (at top level, it might be called
>> dynamic
>> > columns)
>> > So if postgresql can have its own way, that would work out too as long
>> as it
>> > can have intuitive query
>> > (like what are implemented for json and jsonb) and fast retrieval of a
>> tree
>> > like object,
>> > it can be called no-sql data type. After all, most motivations of using
>> no-sql
>> > dbs like MongoDB
>> > is about working with dynamic tree object.
>> >
>> > If postgresql can have high performance on this, then many no-sql dbs
>> would
>> > become history.
>>
>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>> way to store, in a data-type-agnostic way, long strings compressed and
>> any other long data type, e.g. long arrays.
>>
>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>> unique case because it is one of the few types that can be processed
>> without reading the entire value, e.g. it has an index.
>>
>> We are going to be hesitant to do something data-type-specific for
>> JSONB.  It would be good if we could develop a data-type-agnostic
>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>> and it is unlikely it will be done in time for 9.6.
>>
>> --
>>   Bruce Momjian  http://momjian.us
>>   EnterpriseDB http://enterprisedb.com
>>
>> + As you are, so once was I. As I am, so you will be. +
>> + Roman grave inscription +
>>
>
>


Re: [GENERAL] Skip trigger?

2016-05-01 Thread hamann . w

>> On Sat, Apr 30, 2016 at 1:38 AM,   wrote:
>> > I have a table with a row update trigger that is quite slow.
>> > The trigger finction basically sets some bits in a "changed" column
>> > depending on which values really changed.
>> > For some bulk updates it can be determined in advance that the
>> > trigger function will not have any effect.
>> > Is there any way to run an update query and specify that it should not
>> > activate the trigger.
>> > I know that I can disable the trigger and reenable it later;
>> > however other concurrent updates mights need it
>> 
>> Indeed the main issue is how you want to handle concurrency.  ALTER
>> TABLE statements to disable triggers works and is transactional, but
>> locks the table, which may be undesired.  Here are some useful
>> pointers: 
>> http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html
>> 
Hi Manuel,

many thanks ... this seems to be just what I was looking for. I will give it a 
try tomorrow

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Skip trigger?

2016-05-01 Thread hamann . w
>> On 2016-04-30 02:08, wolfg...@alle-noten.de wrote:
>> > Hi,
>> > 
>> > I have a table with a row update trigger that is quite slow.
>> > The trigger finction basically sets some bits in a "changed" column
>> > depending on which values really changed.
>> > For some bulk updates it can be determined in advance that the
>> > trigger function will not have any effect.
>> > Is there any way to run an update query and specify that it should not
>> > activate the trigger.
>> > I know that I can disable the trigger and reenable it later;
>> > however other concurrent updates mights need it
>> 
>> 
>> I always disable the trigger, run the update, and enable the trigger
>> within a transaction. This locks the table and will prevent other
>> sessions from doing updates without the trigger (I run it during
>> off-hours if it is going to take more time than is acceptable).
>> 
Hi Stephen,
this is what I do now occasionally. However, I would like to be able to run 
some things right away.
The solutions in 
http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html
look very promising

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] intermittent issue with windows 7 service manager not able to correctly determine or control postgresql 9.4

2016-05-01 Thread George Neuner



Disclaimer: My comments here are generic to Windows services.   
I don't run Postgresql on Windows and I have no idea how it is
implemented.  

On Sun, 1 May 2016 03:35:44 +0100, Tom Hodder 
wrote:

>I've got several machines running windows 7 which have postgresql 9.4
>installed as a service, and configured to start automatically on boot. I am
>monitoring these services with zabbix and several times a week I get a
>notification that the postgresql-x64-9.4 service has stopped.
>
>When I login to the machine, the service does appear to be stopped;
>?
>However when I check the database, I can query it ok;

Windows services have a time limit to respond to commands or status
inquries.  The service manager periodically queries status of all
running services - if they don't respond quickly enough, the manager
thinks they are hosed.  That may or may not be true.  

But IME unresponsive services rarely appear "stopped" - usually they
show as "started" in the service manager, or, if you run SC from the
command line their state is shown as "running".


>If I try to start the service from the service manager, I see the following
>error in the logs;
>
>*2016-04-30 05:03:13 BST FATAL:  lock file "postmaster.pid" already
>exists2016-04-30 05:03:13 BST HINT:  Is another postmaster (PID 2556)
>running in data directory "C:/Program Files/PostgreSQL/9.4/data"?*
>
>The pg_ctl tool seems to correctly query the state of the service and
>return the correct PID;
>
>*C:\Program Files\PostgreSQL\9.4>bin\pg_ctl.exe -D "C:\Program
>Files\PostgreSQL\9.4\data" status
>pg_ctl: server is running (PID: 2556**)*

Which suggest the service either is not reponding to the manager's
status inquiries, or is responding too late.


>The other thing that seems to happen is the pgadmin3 tool seems to
>have lost the ability to control the service as all the options for
>start/stop are greyed out;
>[image: Inline images 2]

This is likely because the service manager believes the service is
unresponsive.  The programming API communicates with the manager.


>The only option to get the control back is to kill the processes in
>the task manager or reboot the machine.

You could try "sc stop " from the command line.  
The SC tool is separate from the shell "net" command and it sometimes
will work when "net stop " does not.

You also could try using recovery options in the service manager to
automatically restart the service.  But if the service is showing as
"stopped" when it really is running, this is unlikely to work.


>Any suggestions on what might be causing this?

Services are tricky to get right: there are a number of rules the
control interface has to obey that are at odds with doing real work.  

A single threaded service must periodically send "busy" status to the
manager during lengthy processing.  Failure to do that in a timely
manner will cause problems.

A multi-threaded service that separates processing from control must
be able to suspend or halt the processing when directed and send
"busy" status if it can't.

There is a way to launch arbirtrary programs as services so they can
run at startup and in the background, but programs that weren't
written explicitly to BE services don't obey the service manager and
their diplayed status usually is bogus (provided by the launcher).

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general