Re: [GENERAL] dynamic schema modeling and performance

2017-04-12 Thread Dorian Hoxha
I've done the dynamic-table-per-project previously and it worked great.
Even dynamic indexes on it. If low thousands it should work ok. If more
than that, use as many static-columns as possible, everything dynamic in
jsonb, and check stuff with per-project-constraints.

On Wed, Apr 12, 2017 at 3:31 AM, RJ Ewing <ewing...@gmail.com> wrote:

> I thought that might be an answer around here :)
>
> I guess I was looking for what might be a better approach.
>
> Is dynamically creating a table for each entity a bad idea? I can see
> something like creating a schema for each project (group of related
> entities) and then creating a table for each schema. I don't expect having
> more then a few thousand projects anytime soon. We have a relatively
> targeted audience.
>
> Or would it be better to use jsonb data types and create a denormalized
> index elsewhere?
>
>
>
> On Apr 11, 2017, at 5:17 PM, Dorian Hoxha <dorian.ho...@gmail.com> wrote:
>
> If you are asking if you should go nosql, 99% you should not.
>
> On Tue, Apr 11, 2017 at 10:06 PM, Poul Kristensen <bcc5...@gmail.com>
> wrote:
>
>> dataverse.org uses Postgresql and is well documented + it is completely
>> user driven. Maybe the concept could be usefull for you. I have installed
>> and configuration a few to be uses for researchers.
>>
>> regards
>> Poul
>>
>>
>> 2017-04-11 19:46 GMT+02:00 Rj Ewing <ewing...@gmail.com>:
>>
>>> I'm looking for thoughts on the best way to handle dynamic schemas.
>>>
>>> The application I am developing revolves around user defined entities.
>>> Each entity is a tabular dataset with user defined columns and data types.
>>> Entities can also be related to each other through Parent-Child
>>> relationships. Some entities will be 100% user driven, while others (such
>>> as an entity representing a photo) will be partially user driven (all photo
>>> entities will have common fields + custom user additions).
>>>
>>> I was hoping to get opinions on whether postgresql would be a suitable
>>> backend. A couple of options I have thought of are:
>>>
>>> 1. Each entity is represented as a table in psql. The schema would be
>>> dynamically updated (with limits) when an entity mapping is updated. I
>>> believe that this would provide the best data constraints and allow the
>>> best data normalization. *A concern I have is that there could be an
>>> enormous amount of tables generated and the performance impacts this might
>>> have in the future*. I could then run elasticsearch as a denormalized
>>> cache for efficient querying and full-text-search.
>>>
>>> 2. Use a nosql database. This provides the "dynamic" schema aspect. A
>>> concern here is the lack of relation support, thus leading to a more
>>> denormalized data structure and the potential for the data to become
>>> corrupted.
>>>
>>> Any opinions on the use of psql for this case, or other options would be
>>> greatly appreciated!
>>>
>>> RJ
>>>
>>
>>
>>
>> --
>> Med venlig hilsen / Best regards
>> Poul Kristensen
>> Linux-OS/Virtualizationexpert and Oracle DBA
>>
>
>


Re: [GENERAL] dynamic schema modeling and performance

2017-04-11 Thread Dorian Hoxha
If you are asking if you should go nosql, 99% you should not.

On Tue, Apr 11, 2017 at 10:06 PM, Poul Kristensen  wrote:

> dataverse.org uses Postgresql and is well documented + it is completely
> user driven. Maybe the concept could be usefull for you. I have installed
> and configuration a few to be uses for researchers.
>
> regards
> Poul
>
>
> 2017-04-11 19:46 GMT+02:00 Rj Ewing :
>
>> I'm looking for thoughts on the best way to handle dynamic schemas.
>>
>> The application I am developing revolves around user defined entities.
>> Each entity is a tabular dataset with user defined columns and data types.
>> Entities can also be related to each other through Parent-Child
>> relationships. Some entities will be 100% user driven, while others (such
>> as an entity representing a photo) will be partially user driven (all photo
>> entities will have common fields + custom user additions).
>>
>> I was hoping to get opinions on whether postgresql would be a suitable
>> backend. A couple of options I have thought of are:
>>
>> 1. Each entity is represented as a table in psql. The schema would be
>> dynamically updated (with limits) when an entity mapping is updated. I
>> believe that this would provide the best data constraints and allow the
>> best data normalization. *A concern I have is that there could be an
>> enormous amount of tables generated and the performance impacts this might
>> have in the future*. I could then run elasticsearch as a denormalized
>> cache for efficient querying and full-text-search.
>>
>> 2. Use a nosql database. This provides the "dynamic" schema aspect. A
>> concern here is the lack of relation support, thus leading to a more
>> denormalized data structure and the potential for the data to become
>> corrupted.
>>
>> Any opinions on the use of psql for this case, or other options would be
>> greatly appreciated!
>>
>> RJ
>>
>
>
>
> --
> Med venlig hilsen / Best regards
> Poul Kristensen
> Linux-OS/Virtualizationexpert and Oracle DBA
>


Re: [GENERAL] Free seminar in Malmo: PostgreSQL at 10TB and Beyond

2017-01-21 Thread Dorian Hoxha
Just share the slides/video in this thread friend.

On Sat, Jan 21, 2017 at 10:57 AM, Seref Arikan 
wrote:

> Any chance this will be recorded? The content looks great and would be of
> interest to many.
>
> Cheers
> Seref
>
>
> On Sat, Jan 21, 2017 at 8:55 AM, Chris Travers 
> wrote:
>
>> Hi;
>>
>> I have been asked to deliver a seminar on use of PostgreSQL in large data
>> environments.  I have decided to send this to the general list as well
>> because I think it may be of general interest here,
>>
>> We will be talking about how problems and solutions for PostgreSQL when
>> using it to manage 10-12TB of data under heavy load.
>>
>> In particular, 4 specific problems we faced will be discussed in
>> technical detail:
>>
>> 1. How table partitioning solved a number of our performance problems.
>> 2. How non-1NF designs solved some problems and created some new ones.
>> 3. How PL/Perl was used to address large data sets of semi-structured
>> data.
>> 4. How use of PostgreSQL as job queue store raised unexpected problems,
>> and their solutions.
>>
>> The discussion surrounds a case study which shows how well PostgreSQL
>> scales, but offers a number of lessons for database management in smaller
>> sizes as well.
>>
>> If you are interested and in the Southern Sweden or Copenhagen areas,
>> please feel free to register at https://www.eventbrite.com/
>> e/postgresql-at-10-tb-and-beyond-tickets-30841174784
>>
>> --
>> Best Wishes,
>> Chris Travers
>>
>> Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
>> lock-in.
>> http://www.efficito.com/learn_more
>>
>
>


[GENERAL] When updating row that has TOAST column, is the TOAST column also reinserted ? Or just the oid to the value?

2016-12-13 Thread Dorian Hoxha
Hello friends,

When updating row that has TOAST column, is the TOAST column also inserted
? Or just the oid?
Say I have a 1MB value in the TOAST column, and I update the row by
changing another column, and since every update is an insert, will it also
reinsert the toast-column ? The column that I will update will have an
index so I think hot-update won't work in this case ? The same question
also when full-page-writes is enabled ?

Using 9.6.

Thank you!


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Dorian Hoxha
@Aleksander
~everyone wants lower data storage and wants some kind of compression.
Can this be made to automatically retrain when analyzing (makes sense?)?
And create a new dictionary only if it changes compared to the last one.

On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hello, Simon.
>
> Thanks for you interest to this project!
>
> > Will you be submitting this to core?
>
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: [GENERAL] Multiple inserts

2016-09-23 Thread Dorian Hoxha
If the connection is in autocommit, then each statement will also incur a
commit (write to the commit log on disk).

On Fri, Sep 23, 2016 at 2:01 PM, Rakesh Kumar 
wrote:

> Hi
>
> I am noticing that if I do this
>
> insert into table values(1,a)
> insert into table values(2,b)
>
> insert into table values(3,c)
> 
> commit after 500 rows
> it is lot slower (almost 10x) than
> insert into table values((1,a),(2,b),(3,c)) .. upto 500
>
> It is obvious that the gain is due to network trip avoided after every
> insert.
>
> My question is: Are they any known side-effects or drawbacks of using
> multiple inserts.  I am aware the error checking is much more difficult in
> the second approach. Any other drawbacks?
>
> thanks
>


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Dorian Hoxha
Check out Voltdb (or Scylladb which is more different) for the changes in
architecture required to achieve those performance increases.

On Fri, Sep 2, 2016 at 7:32 PM, Andres Freund  wrote:

> On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> > On Fri, Sep 2, 2016 at 4:49 AM, dandl  wrote:
> > > Re this talk given by Michael Stonebraker:
> > >
> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
> DB2, MS
> > > SQL Server, Postgres, given enough memory that the entire database
> lives in
> > > cache, the server will spend 96% of its memory cycles on unproductive
> > > overhead. This includes buffer management, locking, latching
> (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
>
> I think those numbers are overblown, and more PR than reality.
>
> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.
>
>
> > > I wondered if there are any figures or measurements on Postgres
> performance
> > > in this ‘enough memory’ environment to support or contest this point of
> > > view?
>
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ... workloads
> is going to look different, and the overheads are elsewhere.
>
> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".
>
>
> > What limits postgresql when everything fits in memory? The fact that
> > it's designed to survive a power outage and not lose all your data.
> >
> > Stonebraker's new stuff is cool, but it is NOT designed to survive
> > total power failure.
> >
> > Two totally different design concepts. It's apples and oranges to
> compare them.
>
> I don't think they're that fundamentally different.
>
>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-26 Thread Dorian Hoxha
Many comments: https://news.ycombinator.com/item?id=12166585
https://www.reddit.com/r/programming/comments/4uph84/why_uber_engineering_switched_from_postgres_to/


On Tue, Jul 26, 2016 at 7:39 PM, Guyren Howe  wrote:

> Honestly, I've never heard of anyone doing that. But it sounds like they
> had good reasons.
>
> https://eng.uber.com/mysql-migration/
>
> Thoughts?
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Multimaster

2016-04-10 Thread Dorian Hoxha
@Konstantin
1. It's ok in my cases.
2. Not required in my cases.
3. Just require users to use different servers for now I think.
Sometimes(always?) users can be greedy with feature requests.
4. I want magically consistency + failover (I can instruct the client to
retry all masters).

Good-cluster is the only thing that is stopping postgresql from killing
many nosqls in my opinion. Citusdb is single-master(can't even update many
rows on the same shard), Postgres-XL has no highavailibility (and having 3
servers per node + global dtm), GreenPlum is olap(and old version of
postgres), app-side sucks.

With sharding, the holy-grail is to have
automatic-global-range-partitioning (hypertable,hbase,cockroachdb,bigtable
etc) on the primary-key or a special column. The con of this, is that
indexes of a row reside in another region, which may mean another server,
which you need cross-server-transaction to update a row.
In this case, I think you can save the indexes on the same server as the
row and be fine.

To have good speed you can implement: 1-shard-only-queries (no cross-nodes
transaction), async-disk changes (so the changes are done only in the
memory of all replicated servers and not in commit-log (used by couchbase)).

Multi-shard transactions aren't needed as much in my opinion.

5. The fewer places/files there are to configure any system the better it
is IMHO.

Cheers and good luck!

On Fri, Apr 1, 2016 at 8:15 PM, Moreno Andreo 
wrote:

> Il 01/04/2016 17:22, Joshua D. Drake ha scritto:
>
>> On 04/01/2016 04:50 AM, Konstantin Knizhnik wrote:
>>
>> There are also some minor technical issues which lead us to making few
>>> changes in pglogical code but we tried to do our best to keep original
>>> versions unchanged, so we can switch to public version in future.
>>>
>>
>> It seems the better solution for all of this would be for BDR and this
>> project to come together as a community project to get multimaster for
>> PostgreSQL. Otherwise we are going to end up with the same situation we had
>> before Pg had master-slave replication.
>>
>> JD
>>
>> I'd like to post my use case, hoping it could be of any interest.
> We are giving a cloud service, based on our professional application,
> hosting databases on our server and replicating them to every site the
> users register with.
> Every user have (for now) a unique database to keep in sync.
> Users write on their local database and data is replicated with the server
> via a background process running while user is connected.
> Actually we have to improve what our replicator is doing: it's only
> replicating the single user's database. The improvement should that we can
> put it on the "server" (in some cases there are groups of users sharing a
> dedicated server) and, given a configuration of what and how to replicate,
> it should replicate more than one DB a time.
> Actually, it's a stand-alone program, but what we'd like would be
> something more integrated in PostgreSQL, so where PostgreSQL can run
> (everywhere!), so can it.
> We were beginning to "translate" (and then improve) this program in c#,
> when I bumped into articles pointing to BDR, and I started taking a look.
> But it seems that is good to replicahe whole servers, and still hasn't the
> granularity we need.
>
> My 2 cent...
>
> Cheers,
> Moreno.
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-20 Thread Dorian Hoxha
Is there any database that actually supports what the original poster
wanted ?

The only thing that I know that's similar is bigtable/hbase/hypertable wide
column store.
The way it works is:
break the lexicographically sorted rows into blocks of compressed XXKB, and
then keeps an index on the start_key+end_key of each block.

This way we can store the index(that links to several toast values) on the
row and depending on which key you need it will get+decompress the required
block.
You can interpret nested values by using a separator on the key like
"first_level:2ndlevel:3rd_level:value".
If the index is too big, you can store the index itself in a toast value.

Note: I have no idea how to(if it can be) actually code this.

On Wed, Jan 20, 2016 at 9:32 AM, Oleg Bartunov  wrote:

>
>
> On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian  wrote:
>
>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>> > Hi,
>> >
>> > Congrats on the official release of 9.5
>> >
>> > And I'd like bring up the issue again about if 9.6 would address the
>> jsonb
>> > performance issue
>> > with large number of top level keys.
>> > It is true that it does not have to use JSON format. it is about
>> serialization
>> > and fast retrieval
>> > of dynamic tree structure objects. (at top level, it might be called
>> dynamic
>> > columns)
>> > So if postgresql can have its own way, that would work out too as long
>> as it
>> > can have intuitive query
>> > (like what are implemented for json and jsonb) and fast retrieval of a
>> tree
>> > like object,
>> > it can be called no-sql data type. After all, most motivations of using
>> no-sql
>> > dbs like MongoDB
>> > is about working with dynamic tree object.
>> >
>> > If postgresql can have high performance on this, then many no-sql dbs
>> would
>> > become history.
>>
>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>> way to store, in a data-type-agnostic way, long strings compressed and
>> any other long data type, e.g. long arrays.
>>
>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>> unique case because it is one of the few types that can be processed
>> without reading the entire value, e.g. it has an index.
>>
>> We are going to be hesitant to do something data-type-specific for
>> JSONB.  It would be good if we could develop a data-type-agnostic
>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>> and it is unlikely it will be done in time for 9.6.
>>
>
> I'm looking on this time to time.
>
>
>>
>> --
>>   Bruce Momjian  http://momjian.us
>>   EnterpriseDB http://enterprisedb.com
>>
>> + As you are, so once was I. As I am, so you will be. +
>> + Roman grave inscription +
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] Happy New Year

2016-01-01 Thread Dorian Hoxha
Happy Holidays!

Let's have automatic sharding and distributed transactions!

On Fri, Jan 1, 2016 at 3:51 PM, Melvin Davidson 
wrote:

> Happy New Year to all!
>
> On Fri, Jan 1, 2016 at 2:40 AM, Michael Paquier  > wrote:
>
>> On Fri, Jan 1, 2016 at 11:36 AM, Joshua D. Drake 
>> wrote:
>> > Welcome to 2016.
>> >
>> > Let's have another bang up year!
>>
>> +1.
>> --
>> Michael
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


Re: [GENERAL] Is BDR support distributed table on slave nodes with ACID and join support.

2015-07-17 Thread Dorian Hoxha
1,2,3: You can't shard with BDR. It's only for multimaster (at least for
now). Please read the docs.

On Fri, Jul 17, 2015 at 9:02 AM, Amit Bondwal bondwal.a...@gmail.com
wrote:

 Hello everyone,

 We ae working on a application in which we are using posgresql as a
 database. We are sure that in future it will extend to level that we have
 to scale it horizontally. We have some tables which are going to be more
 than 90% of database size.

 I looked at pg_shard, as per my understanding it is for nosql data, in our
 case we are going to use postgresql RDBMS features without nosql, pg_shard
 have lot on limitation as per our case.

 I looked at postgresql-xl, it looks good as per our requirements, but we
 have to do a lot of things to make it high availaiblity and it don't have
 debian packages available for debian.

 Now I see BDR, it looks good, it have multimasters, we can start with one
 master and later we can add more master servers to avoid failover.
 Bellow are my queries:-

 1. If start with postgresql 9.4 database, later can I upgrade it to DBR
 with these already large table, can I distribute these tables or shard
 them. Do I need to do some extra changes in database or it will be
 complicate later.

 2. how can I distribute some of tables on my slave nodes, if it have any
 limitations please let me know.

 3. if I started with 4 or 5 node cluster with one master and rest of slave
 nodes, how can I take backup of this distributed database to one of my
 other server or backup location.

 Please suggest me what would be the best approch for this.

 --
 Thanks  Regards,

 Amit Bondwal



Re: [GENERAL] Database designpattern - product feature

2015-06-02 Thread Dorian Hoxha
Please do reply-all so you also reply to the list.

It's not ~good to develop with sqlite and deploy on posgresql. You should
have your 'dev' as close to 'prod' as possible.

Product_feature is another table in this case ?

On Tue, Jun 2, 2015 at 11:44 AM, Adrian Stern adrian.st...@unchained.ch
wrote:

 Database changeability is not a requirement. It just comes with django and
 makes development so much easier since I can develop on sqlite and deploy
 the wherever I want. Django orm is not great I agree, but it certainly
 does not suck, there are alternatives like sqlalchemy which are far more
 powerful.

 But yea. I get what you're trying to tell me. And I agree, this postgresql
 feature for jsonb look really nice. Much more easy to use than the whole
 xquery stuff. The complete EAV Pattern described on wikipedia is actually
 pretty complex and seems not easy at all to implement in using django.
 Therefore, i maybe should run a few tests with the json plugins.

 So let my sketch another approach.

 PRODUCT - P
 - name
 - type
 - features (jsonb)

 PRODUCT_FEATURE - PF
 - name
 - description
 - datatype
 - validation

 P now has the features field of type jsonb, which allows keys specified in
 PF together with a value of datatype or simply a valid one. PF holds the
 key-name, its datatype for generating the GUI, and some validation pattern
 for input sanitizing. There is no relation between the Tables.

 Getting the description is not an issue. I could even create a view
 mapping the jsonb keys to rows.

 Yes I like your approach. Is there anything I should be aware of? Some
 do's and don'ts or known pitfalls?




Re: [GENERAL] Database designpattern - product feature

2015-06-02 Thread Dorian Hoxha
So product_feature is only 1 row for each product_type, right ? Looks good.

On Tue, Jun 2, 2015 at 1:15 PM, Adrian Stern adrian.st...@unchained.ch
wrote:

 Sorry. Will do in the future.

 Product_freature is a table describing the valid keys for product
 features. With this it is possible to limit keys to specific groups of
 products.

 Freundliche Grüsse

 Adrian Stern
 unchained - web solutions

 adrian.st...@unchained.ch
 +41 79 292 83 47

 On Tue, Jun 2, 2015 at 12:58 PM, Dorian Hoxha dorian.ho...@gmail.com
 wrote:

 Please do reply-all so you also reply to the list.

 It's not ~good to develop with sqlite and deploy on posgresql. You should
 have your 'dev' as close to 'prod' as possible.

 Product_feature is another table in this case ?

 On Tue, Jun 2, 2015 at 11:44 AM, Adrian Stern adrian.st...@unchained.ch
 wrote:

 Database changeability is not a requirement. It just comes with django
 and makes development so much easier since I can develop on sqlite and
 deploy the wherever I want. Django orm is not great I agree, but it
 certainly does not suck, there are alternatives like sqlalchemy which are
 far more powerful.

 But yea. I get what you're trying to tell me. And I agree, this
 postgresql feature for jsonb look really nice. Much more easy to use than
 the whole xquery stuff. The complete EAV Pattern described on wikipedia is
 actually pretty complex and seems not easy at all to implement in using
 django. Therefore, i maybe should run a few tests with the json plugins.

 So let my sketch another approach.

 PRODUCT - P
 - name
 - type
 - features (jsonb)

 PRODUCT_FEATURE - PF
 - name
 - description
 - datatype
 - validation

 P now has the features field of type jsonb, which allows keys specified
 in PF together with a value of datatype or simply a valid one. PF holds
 the key-name, its datatype for generating the GUI, and some validation
 pattern for input sanitizing. There is no relation between the Tables.

 Getting the description is not an issue. I could even create a view
 mapping the jsonb keys to rows.

 Yes I like your approach. Is there anything I should be aware of? Some
 do's and don'ts or known pitfalls?






Re: [GENERAL] Database designpattern - product feature

2015-06-01 Thread Dorian Hoxha
What about keeping all the dynamic columns of each product in a json(b)
column ?
Maybe you can make constraints that check the product_type and
json-field-type ?

On Mon, Jun 1, 2015 at 4:35 PM, Adrian Stern adrian.st...@unchained.ch
wrote:

 Hi, I'm new

 I've been working as the sole administrator of various postgresql projects
 for a while now. All of which where django projects.
 Since a new project is starting and we've found the need for a more
 generic approach I would like to ask a few questions.

 I would like to implement a pattern similar to the product feature pattern
 explained in the silverstone book - the data model resource book vol 1. It
 is simply explained. There is a Table PRODUCT holding the fields all the
 products share, then there is the table PRODUCT_FEATURE, both of them in a
 “many to many“ relationship.

 PRODUCT --- m  n --- PRODUCT_FEATURE (a table in between of
 course)

 PRODUCT_FEATURE -- PF
 PRODUCT -- P
 TABLE IN BETWEEN -- TIB

 PF defines the feature Type while P stands for the product the feature is
 applied to. Some of these PF can have values of different types (text,
 numbers, floating, blob, ...) which would be applied to TIB.

 I don't like the idea of having numerous empty fields prepared in TIB,
 just to store occasional values of different types, therefore I need to
 specialize those TIB Values.

 Now how would I do That?

 I could create some tables solely for the means of holding [NUM], [TEXT],
 [BLOB], [ETC] and reference them with the TIB PK. When using them I could
 create a view TIBV containing all of [NUM, TEXT, BLOB, ETC] in the same
 column called Value, and join it with TIB to get the value of a PF.

 But is this a good idea?
 Is there a better way?

 Also, I would have to create a pivot table in order to list all the
 products with all the features. As this is implemented in C (afaik) I
 suppose it is rather fast or at least fast enough, but I do not actually
 know. What I know is, there are about 30 Product Types and around 50
 possible product features. One product can have up to approximately 25 PF
 but are mostly around 5 to 10.

 Do you think a pivot table is a good idea?
 What alternative do i have?

 There is room for caching since the dataset is not updated too often.

 regards, adrian



Re: [GENERAL] Stellar Phoenix File Recovery Software

2015-05-01 Thread Dorian Hoxha
That's spam. Can an admin ban this user/email ?



On Fri, May 1, 2015 at 8:22 AM, recoverdata susanhsulliv...@gmail.com
wrote:

 When a file is deleted from your computer, its contents aren't immediately
 destroyed. Windows simply marks the hard drive space as being available for
 use by changing one character in the file table. If you manage to start an
 undelete process before Windows overwrites that part of hard disk with new
 files. You can set that flag back to on and get your file using Recover
 Data for Windows Data Recovery Software.
 Have a look at:-
 http://recover-data-for-fat-ntfs-trial-version.software.informer.com
 http://recover-data-for-fat-ntfs-trial-version.software.informer.com



 --
 View this message in context:
 http://postgresql.nabble.com/Stellar-Phoenix-File-Recovery-Software-tp5845376p5847588.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Streaming-SQL Database PipelineDB (Based on PostgreSQL 9.4) - Available in Beta

2015-04-25 Thread Dorian Hoxha
Hi Jeff,

Looks good. Some questions:

   - how are data stored? btree, lsm etc?

   - indexes on the views
   ex doing a view of counting daily views for each url I may need to query
   by day or by url (table scans?)

   - distribution / sharding?
   how will stuff be consistent if the view's data and the event-row are in
   different servers ?

   - what is the consistency? examples:
   do you write to disk on every insert? or do you disk-bulk-commit
   different queries ?
   Flipping a bit(and saving this change to disk) for each view-batch maybe
   will be slow ?
   Wouldn't it be better to pass over the data and compute all views in 1
   pass and then group-commit all the view-changes?

   - acunu used to do the same thing like you and got acquired by apple
   (but they also kept the events)
   maybe a way to keep the events intro different types of storage (like
   s3, local-disk, hdfs etc)

   - provide a downloadable db without registering and post to hn (people
   have often seen your job-ads there)

   - what's the pricing-type ? free with paid support or 

Good luck

On Wed, Apr 22, 2015 at 10:09 PM, Jeff Ferguson j...@pipelinedb.com wrote:

 Hello (PostgreSQL) World,

 We recently shipped our streaming-SQL analytics database, PipelineDB
 http://www.pipelinedb.com/, which is based on PostgreSQL 9.4, in beta
 and are looking for companies to participate as early access partners.

 You can read more about the product in our technical documentation
 http://docs.pipelinedb.com/, but essentially our database runs
 continuous SQL queries on streaming data and stores only the result of
 these queries in the database, enabling realtime analytics and
 operations/monitoring capabilities with a very efficient system.

 Feel free to email me at j...@pipelinedb.com if your company might be
 interested to participate in the beta.

 Thanks,
 Jeff

 --
 Jeff Ferguson
 President and Co-Founder
 http://www.pipelinedb.com

 ᐧ



Re: [GENERAL] PostgreSQL-related legal question

2015-03-11 Thread Dorian Hoxha
I don't see how it could have negative impact on the postgresql project?
It's not like your job will be to find vulnerabilities and not disclose
them ?

On Wed, Mar 11, 2015 at 1:28 PM, Bill Moran wmo...@potentialtech.com
wrote:


 I've been asked to sign a legal document related to a PostgreSQL-
 related job opening. I have concerns about the document and that
 signing it could have a negative impact on the PostgreSQL project
 (in addition to personal concerns).

 I'm guessing I'm not the first person to go through this. I'm
 hoping someone on this list can refer me to a lawyer who is
 familiar with the challenges of NDAs and open source projects.

 I'm not asking for pro-bono, I'm willing to pay for services,
 but I just figured that I might get better results getting a
 referral than by contacting $random_legal_service.

 --
 Bill Moran


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] select top_countries and for each country, select top_cities in that country, in 1 query

2014-08-28 Thread Dorian Hoxha
Thanks John.


On Thu, Aug 28, 2014 at 2:35 PM, John McKown john.archie.mck...@gmail.com
wrote:

 On Mon, Aug 18, 2014 at 10:52 AM, John McKown
 john.archie.mck...@gmail.com wrote:
 
  SELECT avg(b.countcountry)::int as CountryCount, b.country, a.city,
  count(a.city) as CityCount
  FROM t AS a
  INNER JOIN
  (SELECT COUNT(country) AS countcountry, country FROM t GROUP BY country)
 AS
  b
  ON a.country = b.country
  GROUP BY b.country, a.city
  ORDER BY 1 DESC,4 DESC;
 

 I am curious that nobody pointed out that the above might work but is
 really poor code. Given that, I wonder what the people here think of
 the following code. It seems better to me, even if it is more wordy.

 WITH CountryCount AS (
  SELECT COUNT(country) as countryCount,
 country
 FROM t
 GROUP BY country
 ),
 CityCount AS (
 SELECT COUNT(city) as cityCount,
city,
country
FROM t
GROUP BY country, city
 )
 SELECT b.countryCount,
  b.country,
  a.city,
  a.cityCount
 FROM CityCount as a
 INNER JOIN
 CountryCount AS b
 ON a.country = b.country
 ORDER BY b.countcountry DESC,
 a.city DESC


 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] select top_countries and for each country, select top_cities in that country, in 1 query

2014-08-18 Thread Dorian Hoxha
I have CREATE TABLE t (country text, city text);

I want to get with 1 query,

select count(country),country GROUP BY country ORDER BY count(country) DESC


And for each country, to get the same for cities.

Is it possible ?

Thanks


Re: [GENERAL] Psycopg2 : error message.

2014-05-16 Thread Dorian Hoxha
Try to use:

dont catch the exception when you make the connection, to see the right
error

because: i am unable to connect may mean different things:  1.wrong user
2.wrong pass 3.server down etc


On Fri, May 16, 2014 at 12:12 PM, image lcel...@latitude-geosystems.comwrote:

 Dear all,

 I'm writing a python script for a web service. I have to connect to my
 postgres/postgis databases via Psycopg2.

 I writed a little first script just to connect to my pg/postgis db and drop
 a test db.

 But when i execute the python file, i have several error messages.

 Please read the 2 usefull files below :

 pg_test.py
 http://postgresql.1045698.n5.nabble.com/file/n5804197/pg_test.py

 SS_dos.JPG
 http://postgresql.1045698.n5.nabble.com/file/n5804197/SS_dos.JPG


 In advance, thank you to throw light for me.

 Regards.

 IMAGE



 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Psycopg2 : error message.

2014-05-16 Thread Dorian Hoxha
Also remove the first try + remove the space before conn= so you have
this:

#!/Python27/python.exe

import psycopg2

# Try to connect

conn=psycopg2.connect(dbname='busard_test' user='laurent'
host='localhost' password='cactus')

cur = conn.cursor()



On Fri, May 16, 2014 at 12:41 PM, image lcel...@latitude-geosystems.comwrote:

 Hello,

 Thanks. Ok i deleted the except. Please find below the 2 updates usefull
 liles :

 pg_test.py
 http://postgresql.1045698.n5.nabble.com/file/n5804203/pg_test.py

 SS_dos.JPG
 http://postgresql.1045698.n5.nabble.com/file/n5804203/SS_dos.JPG


 In advance, thank you.

 Regards.

 IMAGE.



 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197p5804203.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Psycopg2 : error message.

2014-05-16 Thread Dorian Hoxha
Since I can't understand(french?) the language, what does it mean? Probably
wrong authentication(password?).


On Fri, May 16, 2014 at 1:19 PM, image lcel...@latitude-geosystems.comwrote:

 Thanks for your help.

 So i remove the first try + i remove the space before conn=

 Indeed i have a new error message in my DOS interface :

 pg_test.py
 http://postgresql.1045698.n5.nabble.com/file/n5804207/pg_test.py

 SS_dos.JPG
 http://postgresql.1045698.n5.nabble.com/file/n5804207/SS_dos.JPG


 In advance, thank you.

 IMAGE.




 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197p5804207.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] are analyze statistics synced with replication?

2014-05-15 Thread Dorian Hoxha
If you don't do read queries on the slave than it will not have hot
data/pages/rows/tables/indexes in ram like the primary ? (it smoked weed
and was happy doing nothing so it was happy, but when responsibility came
(being promoted to master) it failed hard)


On Thu, May 15, 2014 at 6:46 AM, Kevin Goess kgo...@bepress.com wrote:

 We have a master/slave setup with replication.  Today we failed over to
 the slave and saw disk I/O go through the roof.

 Are the pg_statistic statistics synced along with streaming replication?
 Are you expected to have to do a vacuum analyze after failing over?  That's
 what we're trying now to see if it makes a difference.  Our next step will
 be to fall back to the first host and see where this one went wrong
 (society?  lax discipline at home? the wrong sort of friends?)





Re: [GENERAL] Full-Text Search question

2014-05-14 Thread Dorian Hoxha
Search for fulltext tutorial  + json functions
http://www.postgresql.org/docs/9.3/static/functions-json.html


On Wed, May 14, 2014 at 1:00 AM, Jesus Rafael Sanchez Medrano 
jesusraf...@gmail.com wrote:

 thanks... could you please be so kind to post some snippet/code for this?

 Att.
 ==
 Jesus Rafael Sanchez Medrano
 Life is a dream, of which all must wake up


 On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.comwrote:

 Easy, you need to extract text fields from json and construct tsvector
 from them (use concatenation, for example).

 On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano
 jesusraf...@gmail.com wrote:
  can postgres do FTS (full text search) on a json column? if possible,
 please
  be so kindd to give some snippet/example.
 
 
  Att.
  ==
  Jesus Rafael Sanchez Medrano
  Life is a dream, of which all must wake up





Re: [GENERAL] Log Data Analytics : Confused about the choice of Database

2014-05-14 Thread Dorian Hoxha
On Wed, May 14, 2014 at 6:48 AM, Peeyush Agarwal 
peeyushagarwal1...@gmail.com wrote:

 Hi,

 Thanks for the reply :)

 Yes, Storing timestamp as timestamp [ (p) ] would be better. I simplified
 the session in question. It may contain alphabets as well. So, I will
 probably need to store it as a string only.

 The problem with types of events is that it is not fixed and will keep
 increasing over time (as more event types are added depending on the need).
 Would you still recommend saving a map in another table? Will it have a
 significant storage benefit and are there any other benefits? Storing it in
 the app will require some addition in code each time a new event type is
 added which is not difficult but time consuming and tedious.

Depending on how many types of events (store it as an smallint) . Integer
indexes can be searched faster + require less size than text ones.
If some parameters inside the json field show up all the time maybe put
them on their own fields, so you don't store the key each time + indexes
build faster.


 I am not very familiar with json data type. Can I query on it as
 effectively as hstore? Also, is it possible to index it partially so as to
 increase query speeds for certain types of queries? For eg. I would require
 parameter username in a lot of queries so I would like to partially index
 over it.

http://www.postgresql.org/docs/9.3/static/functions-json.html for
expression indexes (also make partial indexes, when fields don't exist).
(more functions will come with jsonb in 9.4).


 Peeyush Agarwal


 On Tue, May 13, 2014 at 3:13 PM, Dorian Hoxha dorian.ho...@gmail.comwrote:

 Why not store session as integer?

 And timestamp as timesamp(z?) ?

 If you know the types of events, also store them as integer , and save a
 map of them in the app or on another table ?

 And save the parameters as a json column, so you have more data-types?
 Hstore only has strings.

 Be carefull with the mongodb hipster on the stackoverflow post.
 Elasticsearch is often used for log collection.

 So, what really is the problem ?



 On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal 
 peeyushagarwal1...@gmail.com wrote:

 Hi,

 I have log data of the following format:

 SessionTimestampEventParameters1  1 
Started Session  1  2Logged In
 Username:user12  3Started Session1  3 
Started Challengetitle:Challenge 1, level:22  4  
   Logged InUsername:user2

 Now, a person wants to carry out analytics on this log data (And would
 like to receive it as a JSON blob after appropriate transformations). For
 example, he may want to receive a JSON blob where the Log Data is grouped
 by Session and TimeFromSessionStart and CountOfEvents are added before
 the data is sent so that he can carry out meaningful analysis. Here I
 should return:

 [
   {
 
 session:1,CountOfEvents:3,Actions:[{TimeFromSessionStart:0,Event:Session
  Started}, {TimeFromSessionStart:1, Event:Logged In, 
 Username:user1}, {TimeFromSessionStart:2, Event:Startd Challenge, 
 title:Challenge 1, level:2 }]
   },
   {
 session:2, 
 CountOfEvents:2,Actions:[{TimeFromSessionStart:0,Event:Session 
 Started}, {TimeFromSessionStart:2, Event:Logged In, 
 Username:user2}]
   }]


 Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
 additional data] will not be hard coded and I will make a web interface to
 allow the person to decide what kind of synthetic data he requires in the
 JSON blob. I would like to provide a good amount of flexibility to the
 person to decide what kind of synthetic data he wants in the JSON blob.

 If I use PostgreSQL, I can store the data in the following manner:
 Session and Event can be string, Timestamp can be date and Parameters can
 be hstore(key value pairs available in PostgreSQL). After that, I can
 use SQL queries to compute the synthetic (or additional) data, store it
 temporarily in variables in a Rails Application (which will interact with
 PostgreSQL database and act as interface for the person who wants the JSON
 blob) and create JSON blob from it.

 However I am not sure if PostgreSQL is the best choice for this use
 case. I have put the detailed question on SO at
 http://stackoverflow.com/questions/23544604/log-data-analytics

  Looking for some help from the community.

 Peeyush Agarwal





 --
 Peeyush Agarwal
 IIT Kanpur
 +91 8953453689



Re: [GENERAL] Log Data Analytics : Confused about the choice of Database

2014-05-13 Thread Dorian Hoxha
Why not store session as integer?

And timestamp as timesamp(z?) ?

If you know the types of events, also store them as integer , and save a
map of them in the app or on another table ?

And save the parameters as a json column, so you have more data-types?
Hstore only has strings.

Be carefull with the mongodb hipster on the stackoverflow post.
Elasticsearch is often used for log collection.

So, what really is the problem ?



On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal 
peeyushagarwal1...@gmail.com wrote:

 Hi,

 I have log data of the following format:


 SessionTimestampEventParameters1  1   
  Started Session  1  2Logged In
 Username:user12  3Started Session1  3   
  Started Challengetitle:Challenge 1, level:22  4
 Logged InUsername:user2

 Now, a person wants to carry out analytics on this log data (And would
 like to receive it as a JSON blob after appropriate transformations). For
 example, he may want to receive a JSON blob where the Log Data is grouped
 by Session and TimeFromSessionStart and CountOfEvents are added before
 the data is sent so that he can carry out meaningful analysis. Here I
 should return:


 [
   {
 
 session:1,CountOfEvents:3,Actions:[{TimeFromSessionStart:0,Event:Session
  Started}, {TimeFromSessionStart:1, Event:Logged In, 
 Username:user1}, {TimeFromSessionStart:2, Event:Startd Challenge, 
 title:Challenge 1, level:2 }]
   },
   {
 session:2, 
 CountOfEvents:2,Actions:[{TimeFromSessionStart:0,Event:Session 
 Started}, {TimeFromSessionStart:2, Event:Logged In, 
 Username:user2}]
   }]


 Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
 additional data] will not be hard coded and I will make a web interface to
 allow the person to decide what kind of synthetic data he requires in the
 JSON blob. I would like to provide a good amount of flexibility to the
 person to decide what kind of synthetic data he wants in the JSON blob.

 If I use PostgreSQL, I can store the data in the following manner: Session
  and Event can be string, Timestamp can be date and Parameters can be
 hstore(key value pairs available in PostgreSQL). After that, I can use
 SQL queries to compute the synthetic (or additional) data, store it
 temporarily in variables in a Rails Application (which will interact with
 PostgreSQL database and act as interface for the person who wants the JSON
 blob) and create JSON blob from it.

 However I am not sure if PostgreSQL is the best choice for this use case.
 I have put the detailed question on SO at
 http://stackoverflow.com/questions/23544604/log-data-analytics

  Looking for some help from the community.

 Peeyush Agarwal



Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-29 Thread Dorian Hoxha
So :

   1. drop function
   2. alter type: add column
   3. create again function with new default argument in a transaction ?



On Tue, Apr 29, 2014 at 4:22 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Sun, Apr 27, 2014 at 4:57 PM, Dorian Hoxha dorian.ho...@gmail.com
 wrote:
  Since my alternative is using json, that is heavier (need to store keys
 in
  every row) than composite-types.
  Updating an element on a specific composite_type inside an array of them
 is
  done by UPDATE table SET composite[2].x = 24;
 
  So last standing question, is it possible to insert an array of
  composite_types by not specifying all of the columns for each
 composite_type
  ?
  So if i later add other columns to the composite_type, the insert query
  doesn't break ?

 One way to do it is via 'type constructor function'.

 postgres=# create type foo_t as (a int, b int);
 postgres=# create function foo_t(a int, b int) returns foo_t as $$
 select row(a,b)::foo_t; $$ language sql stable;
 postgres=# create table bar(f foo_t);
 postgres=# insert into bar VALUES ((1,2)::foo_t); -- not tolerant to
 changes
 postgres=# insert into bar VALUES (foo_t(1,2)); -- works fine

 now, to extend the field, we can overload the function making sure to
 default the 3rd argument.
 alter type foo_t add attribute c text;
 drop function foo_t(int, int); -- must do this to make function unambiguous
 create function foo_t(a int, b int, c text = null) returns foo_t as $$
 select row(a,b,c)::foo_t; $$ language sql stable;

 postgres=# INSERT INTO bar VALUES ((1,2)::foo_t);
 postgres=# insert into bar VALUES (foo_t(1,2,'test')); -- works fine

 This technique is somewhat dubious, but if for whatever reason you
 absolutely must preserve client sql in the face of server changes it
 might work.

 merlin



[GENERAL] Postgresql the right tool (queue using advisory_locks + long transactions)

2014-04-27 Thread Dorian Hoxha
Hi list,

I am trying to use postgresql as a queue for long-jobs (max ~4 hours) using
advisory_locks. I can't separate the long-job into sub-jobs.


   1. At ultimate-best-case scenario there will be ~100 workers, so no
   web-scale performance required.
   Is there a problem with 100 open sessions (behind a connection pooler?)
   for hours like in my case?
   The session will commit the progress of the job, but will stay opened
   and hold the advisory_lock till the job is done or it expires (look 2).
   2. Is it possible to set a time limit to auto-close an opened session
   that hasn't made a query in a while ?
   So if a worker fails,and the session time-outs, postgresql/pgbouncer
   will close the session and release the lock ?


Re: [GENERAL] Re: Postgresql the right tool (queue using advisory_locks + long transactions)

2014-04-27 Thread Dorian Hoxha
I'll probably ask @pgbouncer mailing list if i can use it with
advisory_locks per session. If not, even raw sessions will be enough.
Some comments inline.
Thanks

On Sun, Apr 27, 2014 at 10:07 PM, David G Johnston 
david.g.johns...@gmail.com wrote:

 Dorian Hoxha wrote
  Hi list,
 
  I am trying to use postgresql as a queue for long-jobs (max ~4 hours)
  using
  advisory_locks. I can't separate the long-job into sub-jobs.
 
 
 1. At ultimate-best-case scenario there will be ~100 workers, so no
 web-scale performance required.
 Is there a problem with 100 open sessions (behind a connection
 pooler?)
 for hours like in my case?
 The session will commit the progress of the job, but will stay opened
 and hold the advisory_lock till the job is done or it expires (look
 2).
 2. Is it possible to set a time limit to auto-close an opened session
 that hasn't made a query in a while ?
 So if a worker fails,and the session time-outs, postgresql/pgbouncer
 will close the session and release the lock ?

 Do you understand the difference between a session and a transaction?  Your
 subject and body indicate either you don't or have a thinko/typo somewhere?
 Long-running transactions (idle or otherwise) are a major problem while
 long-running idle sessions are simply marginally wasteful (in moderation).

 I'm talking about session advisory locks. That's why i choose advisory
instead of SELECT FOR SHARE,a shared row lock because that looks like it
must be inside a transaction, from reading the docs.
I'll keep the session open for the duration of the hole job, and
create/commit transactions to update the progress.
And in the end close the connection and the lock will be released.

 Why not just update the job as dispatched when it is being worked on and
 then completed when finished?  You still would need to handle partial
 completion somehow but this way you don't waste live resources during the
 long waiting period simply monitoring a lock.  Though probably at your
 volume this is not that big an issue.

 PostgreSQL proper does not have session timeouts that I am aware of.  If a
 worker fails it should release its connection and all advisory locks would
 be released...


I thought that if a worker failed the session was left open. So maybe i
don't need the session-timeout thing.

 You reference to a connection pooler in the above doesn't make sense to me;
 you'd need persistent connections for this to work (they can be from a pool
 but the pool size would have to be 100+).

 The main decision is whether job process state is part of your data model
 or
 simply an artifact.  I'd suggest that it should be part of the model so
 state should be managed directly thorough row-level locking and status
 fields instead of indirectly through temporary system advisory locks.


I maintain job-status and job-progress on the same row.
But i also want that users can cancel the job, by updating 'canceled'
column. So everytime i report the progress, i also RETURNING canceled to
see if the job has been canceled by the user and abort it.

 If you want to keep track of active workers you should setup some kind of
 heartbeat update query; and maybe depending on how important this is attach
 advisory lock info to that heartbeat record so a monitoring process can
 check both the pulse table and the system advisory lock for a two data
 point
 confirmation of activity.

 Will probably do it like this.

 David J.





 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Postgresql-the-right-tool-queue-using-advisory-locks-long-transactions-tp5801667p5801670.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-27 Thread Dorian Hoxha
Since my alternative is using json, that is heavier (need to store keys in
every row) than composite-types.
Updating an element on a specific composite_type inside an array of them is
done by UPDATE table SET composite[2].x = 24;

So last standing question, is it possible to insert an array of
composite_types by not specifying all of the columns for each
composite_type ?
So if i later add other columns to the composite_type, the insert query
doesn't break ?

Thanks


On Mon, Apr 21, 2014 at 1:46 PM, Dorian Hoxha dorian.ho...@gmail.comwrote:

 Maybe the char array link is wrong ? I don't think an array of arrays is
 good for my case. I'll probably go for json or separate table since it
 looks it's not possible to use composite-types.


 On Mon, Apr 21, 2014 at 4:02 AM, Rob Sargentg robjsarg...@gmail.comwrote:

  Sorry, I should not have top-posted (Dang iPhone).  Continued below:

 On 04/20/2014 05:54 PM, Dorian Hoxha wrote:

 Because i always query the whole row, and in the other way(many tables) i
 will always join + have other indexes.


 On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.comwrote:

  Why do you think you need an array of theType v. a dependent table of
 theType. This tack is of course immune to to most future type changess.

 Sent from my iPhone

 Interesting.  Of course any decent mapper will return the whole
 row. And would it be less disk intensive as an array of struct ( where
 struct is implemented as an array).  From other threads [1] [2] I've come
 to understand the datatype overhead per native type will be applied per
 type instance per array element.

 [1] 30K 
 floatshttp://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-td5790562.html
 [2] char 
 arrayhttp://postgresql.1045698.n5.nabble.com/COPY-v-java-performance-comparison-tc5798389.html





Re: [GENERAL] hstore binary representation of keys

2014-04-22 Thread Dorian Hoxha
Currently hstore is mongodb. It writes the keys everytime (and values as
strings!, its mostly for dynamic keys or very sparse keys in  my opinion).
You can shorten keys,or put them in dedicated columns.
I haven't read that there is a plan to compress the strings.


On Tue, Apr 22, 2014 at 2:01 PM, Tim Kane tim.k...@gmail.com wrote:

 Hi all,

 I’ve been using hstore to record various key/val pairs, but I’ve noticed
 it consumes a lot more disk than I would have expected.
 I don’t have any hard figures to illustrate, but empirical testing has
 shown that if I record these pairs as traditional column based fields, I
 can save a significant amount of disk.

 What I think I’m seeing here, is that the hstore representation needs to
 store the entirety of the *key* alongside each *value*.

 Let’s say I have a table of 1000 records, and 900 of them have a key named
 ‘A_REALLY_REALLY_REALLY_LONG_KEY’, then this key will be written do disk
 900 times, along with the appropriate values.


 I guess there are two options open to me here.


1. I could transpose these values into a dedicated field
2. I could use shorter key names


 Does hstore2 go any way to improving this situation? Some kind of
 enumerated key based system?



 Cheers,

 TIm




Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-21 Thread Dorian Hoxha
Maybe the char array link is wrong ? I don't think an array of arrays is
good for my case. I'll probably go for json or separate table since it
looks it's not possible to use composite-types.


On Mon, Apr 21, 2014 at 4:02 AM, Rob Sargentg robjsarg...@gmail.com wrote:

  Sorry, I should not have top-posted (Dang iPhone).  Continued below:

 On 04/20/2014 05:54 PM, Dorian Hoxha wrote:

 Because i always query the whole row, and in the other way(many tables) i
 will always join + have other indexes.


 On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.comwrote:

  Why do you think you need an array of theType v. a dependent table of
 theType. This tack is of course immune to to most future type changess.

 Sent from my iPhone

 Interesting.  Of course any decent mapper will return the whole
 row. And would it be less disk intensive as an array of struct ( where
 struct is implemented as an array).  From other threads [1] [2] I've come
 to understand the datatype overhead per native type will be applied per
 type instance per array element.

 [1] 30K 
 floatshttp://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-td5790562.html
 [2] char 
 arrayhttp://postgresql.1045698.n5.nabble.com/COPY-v-java-performance-comparison-tc5798389.html



[GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-20 Thread Dorian Hoxha
Hi list,

I have a
create type thetype(width integer, height integer);
create table mytable(thetype thetype[]);

How can i make an insert statement so if i later add fields to the
composite type, the code/query doesn't break ?
Maybe by specifying the fields of the composite type in the query ?

This can be done for normal inserts(non arrays):
CREATE TABLE mytable (t thetype);
INSERT INTO mytable(t.width, t.height) VALUES (11,22);


Also how to update an whole element of an array of composites ?
Also, how to update an attribute in a specific element in an array of
composites?

(so when i add columns later to the composite, my old code doesn't break)

How much overhead have the composite types beside the values and nulls?

Thanks


Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-20 Thread Dorian Hoxha
Was just curious about the overhead.

I know the columns, but i may need to add other columns in the future.
Yeah, json is the alternative if this doesn't work.



On Sun, Apr 20, 2014 at 7:54 PM, Fede Martinez
federicoemarti...@gmail.comwrote:

 If you don't know the columns your type will have, you could consider
 using json or hstore if the data is unstructured.
 El 20/04/2014 14:04, Dorian Hoxha dorian.ho...@gmail.com escribió:

 Hi list,

 I have a
 create type thetype(width integer, height integer);
 create table mytable(thetype thetype[]);

 How can i make an insert statement so if i later add fields to the
 composite type, the code/query doesn't break ?
 Maybe by specifying the fields of the composite type in the query ?

 This can be done for normal inserts(non arrays):
 CREATE TABLE mytable (t thetype);
 INSERT INTO mytable(t.width, t.height) VALUES (11,22);


 Also how to update an whole element of an array of composites ?
 Also, how to update an attribute in a specific element in an array of
 composites?

 (so when i add columns later to the composite, my old code doesn't break)

 How much overhead have the composite types beside the values and nulls?

 Thanks




Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions

2014-04-20 Thread Dorian Hoxha
Because i always query the whole row, and in the other way(many tables) i
will always join + have other indexes.


On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.com wrote:

 Why do you think you need an array of theType v. a dependent table of
 theType. This tack is of course immune to to most future type changess.

 Sent from my iPhone

 On Apr 20, 2014, at 11:57 AM, Dorian Hoxha dorian.ho...@gmail.com wrote:

 Was just curious about the overhead.

 I know the columns, but i may need to add other columns in the future.
 Yeah, json is the alternative if this doesn't work.



 On Sun, Apr 20, 2014 at 7:54 PM, Fede Martinez 
 federicoemarti...@gmail.com wrote:

 If you don't know the columns your type will have, you could consider
 using json or hstore if the data is unstructured.
 El 20/04/2014 14:04, Dorian Hoxha dorian.ho...@gmail.com escribió:

 Hi list,

 I have a
 create type thetype(width integer, height integer);
 create table mytable(thetype thetype[]);

 How can i make an insert statement so if i later add fields to the
 composite type, the code/query doesn't break ?
 Maybe by specifying the fields of the composite type in the query ?

 This can be done for normal inserts(non arrays):
 CREATE TABLE mytable (t thetype);
 INSERT INTO mytable(t.width, t.height) VALUES (11,22);


 Also how to update an whole element of an array of composites ?
 Also, how to update an attribute in a specific element in an array of
 composites?

 (so when i add columns later to the composite, my old code doesn't
 break)

 How much overhead have the composite types beside the values and nulls?

 Thanks





Re: [GENERAL] Could use some advice on search architecture

2014-04-19 Thread Dorian Hoxha
Postgresql has 2 column store, 1-in memory(cant remember the name) and
http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analytics


On Sat, Apr 19, 2014 at 2:10 PM, Robin robin...@live.co.uk wrote:

  bottom post
 On 19/04/2014 12:46, R. Pasch wrote:

 On 19-4-2014 9:38, Robin wrote:


 Well, given that there are known limited attributes, this is the type of
 application that really really suits a column oriented database, such as
 Sybase IQ (now sold by SAP). Its a neat product that scales. Great
 performance with drag'n'drop analytics.

 Unless you can charm IQ out of SAP (it has been known to happen), you
 might have to look at some other techniques

 So consider some binary data representation
 Red - 1 ( 0001)
 Orange - 2 ( 0010)
 Yellow - 4 ( 0100)
 Green - 8 ( 1000)
 Blue - 16 (0001 )
 Indigo - 32 (0010 )
 Violet - 64 (0100 )

 This way, you can encode several colours in 1 value
 Red or Green or Indigo = 1 + 8  + 32 = 41 = 0010 1001


 Robin


  I stopped reading when I heard the word sold by SAP ;-) This project is
 solely build with open-source and freely available software.

 I've been thinking about using a binary data representation but didn't
 come to a solution to this specific problem quite yet. Per property of a
 product, only one bit would be 1 and the rest would be 0. What would a
 query look like to match all products that have a bit in the correct
 position?

 Say for instance these are a couple records (and yes, property values can
 be null as well)

 title, property1, property2, property3
 
 product1,  0001,  0010, NULL
 product2,  0100, 0100 , 0010 
 product3, 0010 , 0010 , 0100 

 Say that I would like to retrieve the products that either have property1
 as 0010 , 1000 000 or  0001. Combined that would be 0010 1001 and
 would have to match product1 and product3 as they both have their
 individual bit matching one of the bits being asked for. What would a where
 statement look like using this type of binary representation?

 If that would be fairly simple to do and fast (most important factor) then
 I could do an OR construction on all property columns and have something
 count the amount of properties that actually matched. Is that something you
 can do with a binary operator of some sort as well? Count the amount of
 overlapping bits?

 Say for instance I have a binary value of 0110 0101 and another binary
 value of 1100 0100, how could I found out how many bits matched? (in this
 case the number of matching bits would be 2)


 I understand the reluctance to pay SAP-style rates, as a longtime DB user,
 I have learned some 'charm' techniques.

 However, I poked around a bit for alternatives, as I do like the
 column-oriented approach, and found something called - *MonetDB 
 *http://www.monetdb.com/Home/Features-
 it apparently has a column-store db kernel, and is open source - I suggest
 you have a look, if it does what it says on the label, then it looks like a
 find.

 There is a discussion of bitmask-trickiness here also dealing with 
 colourshttp://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html

 Robin
 http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html




Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Dorian Hoxha
Cache the total ?


On Thu, Apr 3, 2014 at 3:34 PM, Leonardo M. Ramé l.r...@griensu.com wrote:

 Hi, in one of our systems, we added a kind of pagination feature, that
 shows N records of Total records.

 To do this, we added a count(*) over() as Total field in our queries
 in replacement of doing two queries, one for fetching the records, and
 other for getting the count. This improved the performance, but we are't
 happy with the results yet, by removing the count, the query takes
 200ms vs 2000ms with it.

 We are thinking of removing the count, but if we do that, the system
 will lack an interesting feature.

 What strategy for showing the total number of records returned do you
 recommend?.

 Regards,
 --
 Leonardo M. Ramé
 Medical IT - Griensu S.A.
 Av. Colón 636 - Piso 8 Of. A
 X5000EPT -- Córdoba
 Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
 Cel.: +54 9 (011) 40871877



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] New OpenSource columnstore from CitusData for Postgresql

2014-04-03 Thread Dorian Hoxha
Link to hackernews which also has some comments from the devs

https://news.ycombinator.com/item?id=7523950

Very interesting: They use foreign data tables as an abstraction to
separate the storage layer from the rest of the database.


Re: [GENERAL] Complex sql, limit-for-each group by, arrays, updates

2013-12-03 Thread Dorian Hoxha
I'll probably go by using 3 queries and putting them in a transaction.

Thanks


On Wed, Nov 27, 2013 at 5:38 PM, David Johnston pol...@yahoo.com wrote:

 Dorian Hoxha wrote
  Hi,
 
  So i have (table where data will be read) :
  CREATE TABLE data (vid,cid,pid,number);
 
  Tables where data will be writen/updated:
 
  CREATE TABLE pid_top_vids (pid, vid[])
  CREATE TABLE pid_top_cids (pid, cid[])
  CREATE TABLE cid_top_vids (cid, vid[])
 
  I need to , possibly in 1 query, this will run once in a while:
  Get top(10) vids , sorted by 'number',grouped by pid and update the row
 in
  TABLE(pid_top_vids).
 
  Get top(10) vids, sorted by 'number', grouped by cid and update the row
 in
  TABLE(cid_top_vids).
 
  Get top(10) cids, sorted by 'number', where number is the SUM() of each
  vid
  GROUP_BY(cid) and update the row in TABLE (cid_top_vids);
 
  So, get data, create sorted array, and update the rows (they exist, so
  only
  the arrays have to be updated).
 
  Possible ?
  Thanks

 I would not bother trying to do it in a single SQL statement but each query
 is possible, and fairly basic, using UPDATE with a FROM clause containing
 the desired sub-query.

 David J.





 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Complex-sql-limit-for-each-group-by-arrays-updates-tp5780554p5780599.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] Complex sql, limit-for-each group by, arrays, updates

2013-11-27 Thread Dorian Hoxha
Hi,

So i have (table where data will be read) :
CREATE TABLE data (vid,cid,pid,number);

Tables where data will be writen/updated:

CREATE TABLE pid_top_vids (pid, vid[])
CREATE TABLE pid_top_cids (pid, cid[])
CREATE TABLE cid_top_vids (cid, vid[])

I need to , possibly in 1 query, this will run once in a while:
Get top(10) vids , sorted by 'number',grouped by pid and update the row in
TABLE(pid_top_vids).

Get top(10) vids, sorted by 'number', grouped by cid and update the row in
TABLE(cid_top_vids).

Get top(10) cids, sorted by 'number', where number is the SUM() of each vid
GROUP_BY(cid) and update the row in TABLE (cid_top_vids);

So, get data, create sorted array, and update the rows (they exist, so only
the arrays have to be updated).

Possible ?
Thanks


[GENERAL] Multiple aggs,sums in 1 query

2013-11-22 Thread Dorian Hoxha
I have: create table tbl (a,b,c,d,e,f,g,h);

And i need to select in 1 query ,or the most performant way:

top 5(a)
top 5(b)
top 5(c): for each top5(c): top 5(d)
count(f) GROUP BY f


I can make these in separate queries but that means that postgresql would
read the table multiple-times?

Is it possible/doable/sane ?

Thanks


[GENERAL] Return only non-null columns

2013-11-13 Thread Dorian Hoxha
Is it possible to:

SELECT * FROM table

But to return only non-null columns ?
Since i use psycopg2 with DictCursor (a hashtable) it's better for me when
i don't have the column that to have it as NULL.

Thanks