Re: [GENERAL] dynamic schema modeling and performance
I've done the dynamic-table-per-project previously and it worked great. Even dynamic indexes on it. If low thousands it should work ok. If more than that, use as many static-columns as possible, everything dynamic in jsonb, and check stuff with per-project-constraints. On Wed, Apr 12, 2017 at 3:31 AM, RJ Ewing <ewing...@gmail.com> wrote: > I thought that might be an answer around here :) > > I guess I was looking for what might be a better approach. > > Is dynamically creating a table for each entity a bad idea? I can see > something like creating a schema for each project (group of related > entities) and then creating a table for each schema. I don't expect having > more then a few thousand projects anytime soon. We have a relatively > targeted audience. > > Or would it be better to use jsonb data types and create a denormalized > index elsewhere? > > > > On Apr 11, 2017, at 5:17 PM, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > > If you are asking if you should go nosql, 99% you should not. > > On Tue, Apr 11, 2017 at 10:06 PM, Poul Kristensen <bcc5...@gmail.com> > wrote: > >> dataverse.org uses Postgresql and is well documented + it is completely >> user driven. Maybe the concept could be usefull for you. I have installed >> and configuration a few to be uses for researchers. >> >> regards >> Poul >> >> >> 2017-04-11 19:46 GMT+02:00 Rj Ewing <ewing...@gmail.com>: >> >>> I'm looking for thoughts on the best way to handle dynamic schemas. >>> >>> The application I am developing revolves around user defined entities. >>> Each entity is a tabular dataset with user defined columns and data types. >>> Entities can also be related to each other through Parent-Child >>> relationships. Some entities will be 100% user driven, while others (such >>> as an entity representing a photo) will be partially user driven (all photo >>> entities will have common fields + custom user additions). >>> >>> I was hoping to get opinions on whether postgresql would be a suitable >>> backend. A couple of options I have thought of are: >>> >>> 1. Each entity is represented as a table in psql. The schema would be >>> dynamically updated (with limits) when an entity mapping is updated. I >>> believe that this would provide the best data constraints and allow the >>> best data normalization. *A concern I have is that there could be an >>> enormous amount of tables generated and the performance impacts this might >>> have in the future*. I could then run elasticsearch as a denormalized >>> cache for efficient querying and full-text-search. >>> >>> 2. Use a nosql database. This provides the "dynamic" schema aspect. A >>> concern here is the lack of relation support, thus leading to a more >>> denormalized data structure and the potential for the data to become >>> corrupted. >>> >>> Any opinions on the use of psql for this case, or other options would be >>> greatly appreciated! >>> >>> RJ >>> >> >> >> >> -- >> Med venlig hilsen / Best regards >> Poul Kristensen >> Linux-OS/Virtualizationexpert and Oracle DBA >> > >
Re: [GENERAL] dynamic schema modeling and performance
If you are asking if you should go nosql, 99% you should not. On Tue, Apr 11, 2017 at 10:06 PM, Poul Kristensenwrote: > dataverse.org uses Postgresql and is well documented + it is completely > user driven. Maybe the concept could be usefull for you. I have installed > and configuration a few to be uses for researchers. > > regards > Poul > > > 2017-04-11 19:46 GMT+02:00 Rj Ewing : > >> I'm looking for thoughts on the best way to handle dynamic schemas. >> >> The application I am developing revolves around user defined entities. >> Each entity is a tabular dataset with user defined columns and data types. >> Entities can also be related to each other through Parent-Child >> relationships. Some entities will be 100% user driven, while others (such >> as an entity representing a photo) will be partially user driven (all photo >> entities will have common fields + custom user additions). >> >> I was hoping to get opinions on whether postgresql would be a suitable >> backend. A couple of options I have thought of are: >> >> 1. Each entity is represented as a table in psql. The schema would be >> dynamically updated (with limits) when an entity mapping is updated. I >> believe that this would provide the best data constraints and allow the >> best data normalization. *A concern I have is that there could be an >> enormous amount of tables generated and the performance impacts this might >> have in the future*. I could then run elasticsearch as a denormalized >> cache for efficient querying and full-text-search. >> >> 2. Use a nosql database. This provides the "dynamic" schema aspect. A >> concern here is the lack of relation support, thus leading to a more >> denormalized data structure and the potential for the data to become >> corrupted. >> >> Any opinions on the use of psql for this case, or other options would be >> greatly appreciated! >> >> RJ >> > > > > -- > Med venlig hilsen / Best regards > Poul Kristensen > Linux-OS/Virtualizationexpert and Oracle DBA >
Re: [GENERAL] Free seminar in Malmo: PostgreSQL at 10TB and Beyond
Just share the slides/video in this thread friend. On Sat, Jan 21, 2017 at 10:57 AM, Seref Arikanwrote: > Any chance this will be recorded? The content looks great and would be of > interest to many. > > Cheers > Seref > > > On Sat, Jan 21, 2017 at 8:55 AM, Chris Travers > wrote: > >> Hi; >> >> I have been asked to deliver a seminar on use of PostgreSQL in large data >> environments. I have decided to send this to the general list as well >> because I think it may be of general interest here, >> >> We will be talking about how problems and solutions for PostgreSQL when >> using it to manage 10-12TB of data under heavy load. >> >> In particular, 4 specific problems we faced will be discussed in >> technical detail: >> >> 1. How table partitioning solved a number of our performance problems. >> 2. How non-1NF designs solved some problems and created some new ones. >> 3. How PL/Perl was used to address large data sets of semi-structured >> data. >> 4. How use of PostgreSQL as job queue store raised unexpected problems, >> and their solutions. >> >> The discussion surrounds a case study which shows how well PostgreSQL >> scales, but offers a number of lessons for database management in smaller >> sizes as well. >> >> If you are interested and in the Southern Sweden or Copenhagen areas, >> please feel free to register at https://www.eventbrite.com/ >> e/postgresql-at-10-tb-and-beyond-tickets-30841174784 >> >> -- >> Best Wishes, >> Chris Travers >> >> Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor >> lock-in. >> http://www.efficito.com/learn_more >> > >
[GENERAL] When updating row that has TOAST column, is the TOAST column also reinserted ? Or just the oid to the value?
Hello friends, When updating row that has TOAST column, is the TOAST column also inserted ? Or just the oid? Say I have a 1MB value in the TOAST column, and I update the row by changing another column, and since every update is an insert, will it also reinsert the toast-column ? The column that I will update will have an index so I think hot-update won't work in this case ? The same question also when full-page-writes is enabled ? Using 9.6. Thank you!
Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB
@Aleksander ~everyone wants lower data storage and wants some kind of compression. Can this be made to automatically retrain when analyzing (makes sense?)? And create a new dictionary only if it changes compared to the last one. On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev < a.aleks...@postgrespro.ru> wrote: > Hello, Simon. > > Thanks for you interest to this project! > > > Will you be submitting this to core? > > I could align ZSON to PostgreSQL code style. I only need to run pgindent > and write a few comments. Do you think community would be interested in > adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific > problem for this? > > -- > Best regards, > Aleksander Alekseev >
Re: [GENERAL] Multiple inserts
If the connection is in autocommit, then each statement will also incur a commit (write to the commit log on disk). On Fri, Sep 23, 2016 at 2:01 PM, Rakesh Kumarwrote: > Hi > > I am noticing that if I do this > > insert into table values(1,a) > insert into table values(2,b) > > insert into table values(3,c) > > commit after 500 rows > it is lot slower (almost 10x) than > insert into table values((1,a),(2,b),(3,c)) .. upto 500 > > It is obvious that the gain is due to network trip avoided after every > insert. > > My question is: Are they any known side-effects or drawbacks of using > multiple inserts. I am aware the error checking is much more difficult in > the second approach. Any other drawbacks? > > thanks >
Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?
Check out Voltdb (or Scylladb which is more different) for the changes in architecture required to achieve those performance increases. On Fri, Sep 2, 2016 at 7:32 PM, Andres Freundwrote: > On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote: > > On Fri, Sep 2, 2016 at 4:49 AM, dandl wrote: > > > Re this talk given by Michael Stonebraker: > > > > > > http://slideshot.epfl.ch/play/suri_stonebraker > > > > > > > > > > > > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, > DB2, MS > > > SQL Server, Postgres, given enough memory that the entire database > lives in > > > cache, the server will spend 96% of its memory cycles on unproductive > > > overhead. This includes buffer management, locking, latching > (thread/CPU > > > conflicts) and recovery (including log file reads and writes). > > I think those numbers are overblown, and more PR than reality. > > But there certainly are some things that can be made more efficient if > you don't care about durability and replication. > > > > > I wondered if there are any figures or measurements on Postgres > performance > > > in this ‘enough memory’ environment to support or contest this point of > > > view? > > I don't think that's really answerable without individual use-cases in > mind. Answering that question for analytics, operational, ... workloads > is going to look different, and the overheads are elsewhere. > > I personally think that each implementations restrictions are more > likely to be an issue than anything "fundamental". > > > > What limits postgresql when everything fits in memory? The fact that > > it's designed to survive a power outage and not lose all your data. > > > > Stonebraker's new stuff is cool, but it is NOT designed to survive > > total power failure. > > > > Two totally different design concepts. It's apples and oranges to > compare them. > > I don't think they're that fundamentally different. > > > Greetings, > > Andres Freund > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Uber migrated from Postgres to MySQL
Many comments: https://news.ycombinator.com/item?id=12166585 https://www.reddit.com/r/programming/comments/4uph84/why_uber_engineering_switched_from_postgres_to/ On Tue, Jul 26, 2016 at 7:39 PM, Guyren Howewrote: > Honestly, I've never heard of anyone doing that. But it sounds like they > had good reasons. > > https://eng.uber.com/mysql-migration/ > > Thoughts? > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Multimaster
@Konstantin 1. It's ok in my cases. 2. Not required in my cases. 3. Just require users to use different servers for now I think. Sometimes(always?) users can be greedy with feature requests. 4. I want magically consistency + failover (I can instruct the client to retry all masters). Good-cluster is the only thing that is stopping postgresql from killing many nosqls in my opinion. Citusdb is single-master(can't even update many rows on the same shard), Postgres-XL has no highavailibility (and having 3 servers per node + global dtm), GreenPlum is olap(and old version of postgres), app-side sucks. With sharding, the holy-grail is to have automatic-global-range-partitioning (hypertable,hbase,cockroachdb,bigtable etc) on the primary-key or a special column. The con of this, is that indexes of a row reside in another region, which may mean another server, which you need cross-server-transaction to update a row. In this case, I think you can save the indexes on the same server as the row and be fine. To have good speed you can implement: 1-shard-only-queries (no cross-nodes transaction), async-disk changes (so the changes are done only in the memory of all replicated servers and not in commit-log (used by couchbase)). Multi-shard transactions aren't needed as much in my opinion. 5. The fewer places/files there are to configure any system the better it is IMHO. Cheers and good luck! On Fri, Apr 1, 2016 at 8:15 PM, Moreno Andreowrote: > Il 01/04/2016 17:22, Joshua D. Drake ha scritto: > >> On 04/01/2016 04:50 AM, Konstantin Knizhnik wrote: >> >> There are also some minor technical issues which lead us to making few >>> changes in pglogical code but we tried to do our best to keep original >>> versions unchanged, so we can switch to public version in future. >>> >> >> It seems the better solution for all of this would be for BDR and this >> project to come together as a community project to get multimaster for >> PostgreSQL. Otherwise we are going to end up with the same situation we had >> before Pg had master-slave replication. >> >> JD >> >> I'd like to post my use case, hoping it could be of any interest. > We are giving a cloud service, based on our professional application, > hosting databases on our server and replicating them to every site the > users register with. > Every user have (for now) a unique database to keep in sync. > Users write on their local database and data is replicated with the server > via a background process running while user is connected. > Actually we have to improve what our replicator is doing: it's only > replicating the single user's database. The improvement should that we can > put it on the "server" (in some cases there are groups of users sharing a > dedicated server) and, given a configuration of what and how to replicate, > it should replicate more than one DB a time. > Actually, it's a stand-alone program, but what we'd like would be > something more integrated in PostgreSQL, so where PostgreSQL can run > (everywhere!), so can it. > We were beginning to "translate" (and then improve) this program in c#, > when I bumped into articles pointing to BDR, and I started taking a look. > But it seems that is good to replicahe whole servers, and still hasn't the > granularity we need. > > My 2 cent... > > Cheers, > Moreno. > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] JSONB performance enhancement for 9.6
Is there any database that actually supports what the original poster wanted ? The only thing that I know that's similar is bigtable/hbase/hypertable wide column store. The way it works is: break the lexicographically sorted rows into blocks of compressed XXKB, and then keeps an index on the start_key+end_key of each block. This way we can store the index(that links to several toast values) on the row and depending on which key you need it will get+decompress the required block. You can interpret nested values by using a separator on the key like "first_level:2ndlevel:3rd_level:value". If the index is too big, you can store the index itself in a toast value. Note: I have no idea how to(if it can be) actually code this. On Wed, Jan 20, 2016 at 9:32 AM, Oleg Bartunovwrote: > > > On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian wrote: > >> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote: >> > Hi, >> > >> > Congrats on the official release of 9.5 >> > >> > And I'd like bring up the issue again about if 9.6 would address the >> jsonb >> > performance issue >> > with large number of top level keys. >> > It is true that it does not have to use JSON format. it is about >> serialization >> > and fast retrieval >> > of dynamic tree structure objects. (at top level, it might be called >> dynamic >> > columns) >> > So if postgresql can have its own way, that would work out too as long >> as it >> > can have intuitive query >> > (like what are implemented for json and jsonb) and fast retrieval of a >> tree >> > like object, >> > it can be called no-sql data type. After all, most motivations of using >> no-sql >> > dbs like MongoDB >> > is about working with dynamic tree object. >> > >> > If postgresql can have high performance on this, then many no-sql dbs >> would >> > become history. >> >> I can give you some backstory on this. TOAST was designed in 2001 as a >> way to store, in a data-type-agnostic way, long strings compressed and >> any other long data type, e.g. long arrays. >> >> In all previous cases, _part_ of the value wasn't useful. JSONB is a >> unique case because it is one of the few types that can be processed >> without reading the entire value, e.g. it has an index. >> >> We are going to be hesitant to do something data-type-specific for >> JSONB. It would be good if we could develop a data-type-agnostic >> approach to has TOAST can be improved. I know of no such work for 9.6, >> and it is unlikely it will be done in time for 9.6. >> > > I'm looking on this time to time. > > >> >> -- >> Bruce Momjian http://momjian.us >> EnterpriseDB http://enterprisedb.com >> >> + As you are, so once was I. As I am, so you will be. + >> + Roman grave inscription + >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > >
Re: [GENERAL] Happy New Year
Happy Holidays! Let's have automatic sharding and distributed transactions! On Fri, Jan 1, 2016 at 3:51 PM, Melvin Davidsonwrote: > Happy New Year to all! > > On Fri, Jan 1, 2016 at 2:40 AM, Michael Paquier > wrote: > >> On Fri, Jan 1, 2016 at 11:36 AM, Joshua D. Drake >> wrote: >> > Welcome to 2016. >> > >> > Let's have another bang up year! >> >> +1. >> -- >> Michael >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > > > > -- > *Melvin Davidson* > I reserve the right to fantasize. Whether or not you > wish to share my fantasy is entirely up to you. >
Re: [GENERAL] Is BDR support distributed table on slave nodes with ACID and join support.
1,2,3: You can't shard with BDR. It's only for multimaster (at least for now). Please read the docs. On Fri, Jul 17, 2015 at 9:02 AM, Amit Bondwal bondwal.a...@gmail.com wrote: Hello everyone, We ae working on a application in which we are using posgresql as a database. We are sure that in future it will extend to level that we have to scale it horizontally. We have some tables which are going to be more than 90% of database size. I looked at pg_shard, as per my understanding it is for nosql data, in our case we are going to use postgresql RDBMS features without nosql, pg_shard have lot on limitation as per our case. I looked at postgresql-xl, it looks good as per our requirements, but we have to do a lot of things to make it high availaiblity and it don't have debian packages available for debian. Now I see BDR, it looks good, it have multimasters, we can start with one master and later we can add more master servers to avoid failover. Bellow are my queries:- 1. If start with postgresql 9.4 database, later can I upgrade it to DBR with these already large table, can I distribute these tables or shard them. Do I need to do some extra changes in database or it will be complicate later. 2. how can I distribute some of tables on my slave nodes, if it have any limitations please let me know. 3. if I started with 4 or 5 node cluster with one master and rest of slave nodes, how can I take backup of this distributed database to one of my other server or backup location. Please suggest me what would be the best approch for this. -- Thanks Regards, Amit Bondwal
Re: [GENERAL] Database designpattern - product feature
Please do reply-all so you also reply to the list. It's not ~good to develop with sqlite and deploy on posgresql. You should have your 'dev' as close to 'prod' as possible. Product_feature is another table in this case ? On Tue, Jun 2, 2015 at 11:44 AM, Adrian Stern adrian.st...@unchained.ch wrote: Database changeability is not a requirement. It just comes with django and makes development so much easier since I can develop on sqlite and deploy the wherever I want. Django orm is not great I agree, but it certainly does not suck, there are alternatives like sqlalchemy which are far more powerful. But yea. I get what you're trying to tell me. And I agree, this postgresql feature for jsonb look really nice. Much more easy to use than the whole xquery stuff. The complete EAV Pattern described on wikipedia is actually pretty complex and seems not easy at all to implement in using django. Therefore, i maybe should run a few tests with the json plugins. So let my sketch another approach. PRODUCT - P - name - type - features (jsonb) PRODUCT_FEATURE - PF - name - description - datatype - validation P now has the features field of type jsonb, which allows keys specified in PF together with a value of datatype or simply a valid one. PF holds the key-name, its datatype for generating the GUI, and some validation pattern for input sanitizing. There is no relation between the Tables. Getting the description is not an issue. I could even create a view mapping the jsonb keys to rows. Yes I like your approach. Is there anything I should be aware of? Some do's and don'ts or known pitfalls?
Re: [GENERAL] Database designpattern - product feature
So product_feature is only 1 row for each product_type, right ? Looks good. On Tue, Jun 2, 2015 at 1:15 PM, Adrian Stern adrian.st...@unchained.ch wrote: Sorry. Will do in the future. Product_freature is a table describing the valid keys for product features. With this it is possible to limit keys to specific groups of products. Freundliche Grüsse Adrian Stern unchained - web solutions adrian.st...@unchained.ch +41 79 292 83 47 On Tue, Jun 2, 2015 at 12:58 PM, Dorian Hoxha dorian.ho...@gmail.com wrote: Please do reply-all so you also reply to the list. It's not ~good to develop with sqlite and deploy on posgresql. You should have your 'dev' as close to 'prod' as possible. Product_feature is another table in this case ? On Tue, Jun 2, 2015 at 11:44 AM, Adrian Stern adrian.st...@unchained.ch wrote: Database changeability is not a requirement. It just comes with django and makes development so much easier since I can develop on sqlite and deploy the wherever I want. Django orm is not great I agree, but it certainly does not suck, there are alternatives like sqlalchemy which are far more powerful. But yea. I get what you're trying to tell me. And I agree, this postgresql feature for jsonb look really nice. Much more easy to use than the whole xquery stuff. The complete EAV Pattern described on wikipedia is actually pretty complex and seems not easy at all to implement in using django. Therefore, i maybe should run a few tests with the json plugins. So let my sketch another approach. PRODUCT - P - name - type - features (jsonb) PRODUCT_FEATURE - PF - name - description - datatype - validation P now has the features field of type jsonb, which allows keys specified in PF together with a value of datatype or simply a valid one. PF holds the key-name, its datatype for generating the GUI, and some validation pattern for input sanitizing. There is no relation between the Tables. Getting the description is not an issue. I could even create a view mapping the jsonb keys to rows. Yes I like your approach. Is there anything I should be aware of? Some do's and don'ts or known pitfalls?
Re: [GENERAL] Database designpattern - product feature
What about keeping all the dynamic columns of each product in a json(b) column ? Maybe you can make constraints that check the product_type and json-field-type ? On Mon, Jun 1, 2015 at 4:35 PM, Adrian Stern adrian.st...@unchained.ch wrote: Hi, I'm new I've been working as the sole administrator of various postgresql projects for a while now. All of which where django projects. Since a new project is starting and we've found the need for a more generic approach I would like to ask a few questions. I would like to implement a pattern similar to the product feature pattern explained in the silverstone book - the data model resource book vol 1. It is simply explained. There is a Table PRODUCT holding the fields all the products share, then there is the table PRODUCT_FEATURE, both of them in a “many to many“ relationship. PRODUCT --- m n --- PRODUCT_FEATURE (a table in between of course) PRODUCT_FEATURE -- PF PRODUCT -- P TABLE IN BETWEEN -- TIB PF defines the feature Type while P stands for the product the feature is applied to. Some of these PF can have values of different types (text, numbers, floating, blob, ...) which would be applied to TIB. I don't like the idea of having numerous empty fields prepared in TIB, just to store occasional values of different types, therefore I need to specialize those TIB Values. Now how would I do That? I could create some tables solely for the means of holding [NUM], [TEXT], [BLOB], [ETC] and reference them with the TIB PK. When using them I could create a view TIBV containing all of [NUM, TEXT, BLOB, ETC] in the same column called Value, and join it with TIB to get the value of a PF. But is this a good idea? Is there a better way? Also, I would have to create a pivot table in order to list all the products with all the features. As this is implemented in C (afaik) I suppose it is rather fast or at least fast enough, but I do not actually know. What I know is, there are about 30 Product Types and around 50 possible product features. One product can have up to approximately 25 PF but are mostly around 5 to 10. Do you think a pivot table is a good idea? What alternative do i have? There is room for caching since the dataset is not updated too often. regards, adrian
Re: [GENERAL] Stellar Phoenix File Recovery Software
That's spam. Can an admin ban this user/email ? On Fri, May 1, 2015 at 8:22 AM, recoverdata susanhsulliv...@gmail.com wrote: When a file is deleted from your computer, its contents aren't immediately destroyed. Windows simply marks the hard drive space as being available for use by changing one character in the file table. If you manage to start an undelete process before Windows overwrites that part of hard disk with new files. You can set that flag back to on and get your file using Recover Data for Windows Data Recovery Software. Have a look at:- http://recover-data-for-fat-ntfs-trial-version.software.informer.com http://recover-data-for-fat-ntfs-trial-version.software.informer.com -- View this message in context: http://postgresql.nabble.com/Stellar-Phoenix-File-Recovery-Software-tp5845376p5847588.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Streaming-SQL Database PipelineDB (Based on PostgreSQL 9.4) - Available in Beta
Hi Jeff, Looks good. Some questions: - how are data stored? btree, lsm etc? - indexes on the views ex doing a view of counting daily views for each url I may need to query by day or by url (table scans?) - distribution / sharding? how will stuff be consistent if the view's data and the event-row are in different servers ? - what is the consistency? examples: do you write to disk on every insert? or do you disk-bulk-commit different queries ? Flipping a bit(and saving this change to disk) for each view-batch maybe will be slow ? Wouldn't it be better to pass over the data and compute all views in 1 pass and then group-commit all the view-changes? - acunu used to do the same thing like you and got acquired by apple (but they also kept the events) maybe a way to keep the events intro different types of storage (like s3, local-disk, hdfs etc) - provide a downloadable db without registering and post to hn (people have often seen your job-ads there) - what's the pricing-type ? free with paid support or Good luck On Wed, Apr 22, 2015 at 10:09 PM, Jeff Ferguson j...@pipelinedb.com wrote: Hello (PostgreSQL) World, We recently shipped our streaming-SQL analytics database, PipelineDB http://www.pipelinedb.com/, which is based on PostgreSQL 9.4, in beta and are looking for companies to participate as early access partners. You can read more about the product in our technical documentation http://docs.pipelinedb.com/, but essentially our database runs continuous SQL queries on streaming data and stores only the result of these queries in the database, enabling realtime analytics and operations/monitoring capabilities with a very efficient system. Feel free to email me at j...@pipelinedb.com if your company might be interested to participate in the beta. Thanks, Jeff -- Jeff Ferguson President and Co-Founder http://www.pipelinedb.com ᐧ
Re: [GENERAL] PostgreSQL-related legal question
I don't see how it could have negative impact on the postgresql project? It's not like your job will be to find vulnerabilities and not disclose them ? On Wed, Mar 11, 2015 at 1:28 PM, Bill Moran wmo...@potentialtech.com wrote: I've been asked to sign a legal document related to a PostgreSQL- related job opening. I have concerns about the document and that signing it could have a negative impact on the PostgreSQL project (in addition to personal concerns). I'm guessing I'm not the first person to go through this. I'm hoping someone on this list can refer me to a lawyer who is familiar with the challenges of NDAs and open source projects. I'm not asking for pro-bono, I'm willing to pay for services, but I just figured that I might get better results getting a referral than by contacting $random_legal_service. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] select top_countries and for each country, select top_cities in that country, in 1 query
Thanks John. On Thu, Aug 28, 2014 at 2:35 PM, John McKown john.archie.mck...@gmail.com wrote: On Mon, Aug 18, 2014 at 10:52 AM, John McKown john.archie.mck...@gmail.com wrote: SELECT avg(b.countcountry)::int as CountryCount, b.country, a.city, count(a.city) as CityCount FROM t AS a INNER JOIN (SELECT COUNT(country) AS countcountry, country FROM t GROUP BY country) AS b ON a.country = b.country GROUP BY b.country, a.city ORDER BY 1 DESC,4 DESC; I am curious that nobody pointed out that the above might work but is really poor code. Given that, I wonder what the people here think of the following code. It seems better to me, even if it is more wordy. WITH CountryCount AS ( SELECT COUNT(country) as countryCount, country FROM t GROUP BY country ), CityCount AS ( SELECT COUNT(city) as cityCount, city, country FROM t GROUP BY country, city ) SELECT b.countryCount, b.country, a.city, a.cityCount FROM CityCount as a INNER JOIN CountryCount AS b ON a.country = b.country ORDER BY b.countcountry DESC, a.city DESC -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] select top_countries and for each country, select top_cities in that country, in 1 query
I have CREATE TABLE t (country text, city text); I want to get with 1 query, select count(country),country GROUP BY country ORDER BY count(country) DESC And for each country, to get the same for cities. Is it possible ? Thanks
Re: [GENERAL] Psycopg2 : error message.
Try to use: dont catch the exception when you make the connection, to see the right error because: i am unable to connect may mean different things: 1.wrong user 2.wrong pass 3.server down etc On Fri, May 16, 2014 at 12:12 PM, image lcel...@latitude-geosystems.comwrote: Dear all, I'm writing a python script for a web service. I have to connect to my postgres/postgis databases via Psycopg2. I writed a little first script just to connect to my pg/postgis db and drop a test db. But when i execute the python file, i have several error messages. Please read the 2 usefull files below : pg_test.py http://postgresql.1045698.n5.nabble.com/file/n5804197/pg_test.py SS_dos.JPG http://postgresql.1045698.n5.nabble.com/file/n5804197/SS_dos.JPG In advance, thank you to throw light for me. Regards. IMAGE -- View this message in context: http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Psycopg2 : error message.
Also remove the first try + remove the space before conn= so you have this: #!/Python27/python.exe import psycopg2 # Try to connect conn=psycopg2.connect(dbname='busard_test' user='laurent' host='localhost' password='cactus') cur = conn.cursor() On Fri, May 16, 2014 at 12:41 PM, image lcel...@latitude-geosystems.comwrote: Hello, Thanks. Ok i deleted the except. Please find below the 2 updates usefull liles : pg_test.py http://postgresql.1045698.n5.nabble.com/file/n5804203/pg_test.py SS_dos.JPG http://postgresql.1045698.n5.nabble.com/file/n5804203/SS_dos.JPG In advance, thank you. Regards. IMAGE. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197p5804203.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Psycopg2 : error message.
Since I can't understand(french?) the language, what does it mean? Probably wrong authentication(password?). On Fri, May 16, 2014 at 1:19 PM, image lcel...@latitude-geosystems.comwrote: Thanks for your help. So i remove the first try + i remove the space before conn= Indeed i have a new error message in my DOS interface : pg_test.py http://postgresql.1045698.n5.nabble.com/file/n5804207/pg_test.py SS_dos.JPG http://postgresql.1045698.n5.nabble.com/file/n5804207/SS_dos.JPG In advance, thank you. IMAGE. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Psycopg2-error-message-tp5804197p5804207.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] are analyze statistics synced with replication?
If you don't do read queries on the slave than it will not have hot data/pages/rows/tables/indexes in ram like the primary ? (it smoked weed and was happy doing nothing so it was happy, but when responsibility came (being promoted to master) it failed hard) On Thu, May 15, 2014 at 6:46 AM, Kevin Goess kgo...@bepress.com wrote: We have a master/slave setup with replication. Today we failed over to the slave and saw disk I/O go through the roof. Are the pg_statistic statistics synced along with streaming replication? Are you expected to have to do a vacuum analyze after failing over? That's what we're trying now to see if it makes a difference. Our next step will be to fall back to the first host and see where this one went wrong (society? lax discipline at home? the wrong sort of friends?)
Re: [GENERAL] Full-Text Search question
Search for fulltext tutorial + json functions http://www.postgresql.org/docs/9.3/static/functions-json.html On Wed, May 14, 2014 at 1:00 AM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: thanks... could you please be so kind to post some snippet/code for this? Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.comwrote: Easy, you need to extract text fields from json and construct tsvector from them (use concatenation, for example). On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: can postgres do FTS (full text search) on a json column? if possible, please be so kindd to give some snippet/example. Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up
Re: [GENERAL] Log Data Analytics : Confused about the choice of Database
On Wed, May 14, 2014 at 6:48 AM, Peeyush Agarwal peeyushagarwal1...@gmail.com wrote: Hi, Thanks for the reply :) Yes, Storing timestamp as timestamp [ (p) ] would be better. I simplified the session in question. It may contain alphabets as well. So, I will probably need to store it as a string only. The problem with types of events is that it is not fixed and will keep increasing over time (as more event types are added depending on the need). Would you still recommend saving a map in another table? Will it have a significant storage benefit and are there any other benefits? Storing it in the app will require some addition in code each time a new event type is added which is not difficult but time consuming and tedious. Depending on how many types of events (store it as an smallint) . Integer indexes can be searched faster + require less size than text ones. If some parameters inside the json field show up all the time maybe put them on their own fields, so you don't store the key each time + indexes build faster. I am not very familiar with json data type. Can I query on it as effectively as hstore? Also, is it possible to index it partially so as to increase query speeds for certain types of queries? For eg. I would require parameter username in a lot of queries so I would like to partially index over it. http://www.postgresql.org/docs/9.3/static/functions-json.html for expression indexes (also make partial indexes, when fields don't exist). (more functions will come with jsonb in 9.4). Peeyush Agarwal On Tue, May 13, 2014 at 3:13 PM, Dorian Hoxha dorian.ho...@gmail.comwrote: Why not store session as integer? And timestamp as timesamp(z?) ? If you know the types of events, also store them as integer , and save a map of them in the app or on another table ? And save the parameters as a json column, so you have more data-types? Hstore only has strings. Be carefull with the mongodb hipster on the stackoverflow post. Elasticsearch is often used for log collection. So, what really is the problem ? On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal peeyushagarwal1...@gmail.com wrote: Hi, I have log data of the following format: SessionTimestampEventParameters1 1 Started Session 1 2Logged In Username:user12 3Started Session1 3 Started Challengetitle:Challenge 1, level:22 4 Logged InUsername:user2 Now, a person wants to carry out analytics on this log data (And would like to receive it as a JSON blob after appropriate transformations). For example, he may want to receive a JSON blob where the Log Data is grouped by Session and TimeFromSessionStart and CountOfEvents are added before the data is sent so that he can carry out meaningful analysis. Here I should return: [ { session:1,CountOfEvents:3,Actions:[{TimeFromSessionStart:0,Event:Session Started}, {TimeFromSessionStart:1, Event:Logged In, Username:user1}, {TimeFromSessionStart:2, Event:Startd Challenge, title:Challenge 1, level:2 }] }, { session:2, CountOfEvents:2,Actions:[{TimeFromSessionStart:0,Event:Session Started}, {TimeFromSessionStart:2, Event:Logged In, Username:user2}] }] Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic additional data] will not be hard coded and I will make a web interface to allow the person to decide what kind of synthetic data he requires in the JSON blob. I would like to provide a good amount of flexibility to the person to decide what kind of synthetic data he wants in the JSON blob. If I use PostgreSQL, I can store the data in the following manner: Session and Event can be string, Timestamp can be date and Parameters can be hstore(key value pairs available in PostgreSQL). After that, I can use SQL queries to compute the synthetic (or additional) data, store it temporarily in variables in a Rails Application (which will interact with PostgreSQL database and act as interface for the person who wants the JSON blob) and create JSON blob from it. However I am not sure if PostgreSQL is the best choice for this use case. I have put the detailed question on SO at http://stackoverflow.com/questions/23544604/log-data-analytics Looking for some help from the community. Peeyush Agarwal -- Peeyush Agarwal IIT Kanpur +91 8953453689
Re: [GENERAL] Log Data Analytics : Confused about the choice of Database
Why not store session as integer? And timestamp as timesamp(z?) ? If you know the types of events, also store them as integer , and save a map of them in the app or on another table ? And save the parameters as a json column, so you have more data-types? Hstore only has strings. Be carefull with the mongodb hipster on the stackoverflow post. Elasticsearch is often used for log collection. So, what really is the problem ? On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal peeyushagarwal1...@gmail.com wrote: Hi, I have log data of the following format: SessionTimestampEventParameters1 1 Started Session 1 2Logged In Username:user12 3Started Session1 3 Started Challengetitle:Challenge 1, level:22 4 Logged InUsername:user2 Now, a person wants to carry out analytics on this log data (And would like to receive it as a JSON blob after appropriate transformations). For example, he may want to receive a JSON blob where the Log Data is grouped by Session and TimeFromSessionStart and CountOfEvents are added before the data is sent so that he can carry out meaningful analysis. Here I should return: [ { session:1,CountOfEvents:3,Actions:[{TimeFromSessionStart:0,Event:Session Started}, {TimeFromSessionStart:1, Event:Logged In, Username:user1}, {TimeFromSessionStart:2, Event:Startd Challenge, title:Challenge 1, level:2 }] }, { session:2, CountOfEvents:2,Actions:[{TimeFromSessionStart:0,Event:Session Started}, {TimeFromSessionStart:2, Event:Logged In, Username:user2}] }] Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic additional data] will not be hard coded and I will make a web interface to allow the person to decide what kind of synthetic data he requires in the JSON blob. I would like to provide a good amount of flexibility to the person to decide what kind of synthetic data he wants in the JSON blob. If I use PostgreSQL, I can store the data in the following manner: Session and Event can be string, Timestamp can be date and Parameters can be hstore(key value pairs available in PostgreSQL). After that, I can use SQL queries to compute the synthetic (or additional) data, store it temporarily in variables in a Rails Application (which will interact with PostgreSQL database and act as interface for the person who wants the JSON blob) and create JSON blob from it. However I am not sure if PostgreSQL is the best choice for this use case. I have put the detailed question on SO at http://stackoverflow.com/questions/23544604/log-data-analytics Looking for some help from the community. Peeyush Agarwal
Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
So : 1. drop function 2. alter type: add column 3. create again function with new default argument in a transaction ? On Tue, Apr 29, 2014 at 4:22 PM, Merlin Moncure mmonc...@gmail.com wrote: On Sun, Apr 27, 2014 at 4:57 PM, Dorian Hoxha dorian.ho...@gmail.com wrote: Since my alternative is using json, that is heavier (need to store keys in every row) than composite-types. Updating an element on a specific composite_type inside an array of them is done by UPDATE table SET composite[2].x = 24; So last standing question, is it possible to insert an array of composite_types by not specifying all of the columns for each composite_type ? So if i later add other columns to the composite_type, the insert query doesn't break ? One way to do it is via 'type constructor function'. postgres=# create type foo_t as (a int, b int); postgres=# create function foo_t(a int, b int) returns foo_t as $$ select row(a,b)::foo_t; $$ language sql stable; postgres=# create table bar(f foo_t); postgres=# insert into bar VALUES ((1,2)::foo_t); -- not tolerant to changes postgres=# insert into bar VALUES (foo_t(1,2)); -- works fine now, to extend the field, we can overload the function making sure to default the 3rd argument. alter type foo_t add attribute c text; drop function foo_t(int, int); -- must do this to make function unambiguous create function foo_t(a int, b int, c text = null) returns foo_t as $$ select row(a,b,c)::foo_t; $$ language sql stable; postgres=# INSERT INTO bar VALUES ((1,2)::foo_t); postgres=# insert into bar VALUES (foo_t(1,2,'test')); -- works fine This technique is somewhat dubious, but if for whatever reason you absolutely must preserve client sql in the face of server changes it might work. merlin
[GENERAL] Postgresql the right tool (queue using advisory_locks + long transactions)
Hi list, I am trying to use postgresql as a queue for long-jobs (max ~4 hours) using advisory_locks. I can't separate the long-job into sub-jobs. 1. At ultimate-best-case scenario there will be ~100 workers, so no web-scale performance required. Is there a problem with 100 open sessions (behind a connection pooler?) for hours like in my case? The session will commit the progress of the job, but will stay opened and hold the advisory_lock till the job is done or it expires (look 2). 2. Is it possible to set a time limit to auto-close an opened session that hasn't made a query in a while ? So if a worker fails,and the session time-outs, postgresql/pgbouncer will close the session and release the lock ?
Re: [GENERAL] Re: Postgresql the right tool (queue using advisory_locks + long transactions)
I'll probably ask @pgbouncer mailing list if i can use it with advisory_locks per session. If not, even raw sessions will be enough. Some comments inline. Thanks On Sun, Apr 27, 2014 at 10:07 PM, David G Johnston david.g.johns...@gmail.com wrote: Dorian Hoxha wrote Hi list, I am trying to use postgresql as a queue for long-jobs (max ~4 hours) using advisory_locks. I can't separate the long-job into sub-jobs. 1. At ultimate-best-case scenario there will be ~100 workers, so no web-scale performance required. Is there a problem with 100 open sessions (behind a connection pooler?) for hours like in my case? The session will commit the progress of the job, but will stay opened and hold the advisory_lock till the job is done or it expires (look 2). 2. Is it possible to set a time limit to auto-close an opened session that hasn't made a query in a while ? So if a worker fails,and the session time-outs, postgresql/pgbouncer will close the session and release the lock ? Do you understand the difference between a session and a transaction? Your subject and body indicate either you don't or have a thinko/typo somewhere? Long-running transactions (idle or otherwise) are a major problem while long-running idle sessions are simply marginally wasteful (in moderation). I'm talking about session advisory locks. That's why i choose advisory instead of SELECT FOR SHARE,a shared row lock because that looks like it must be inside a transaction, from reading the docs. I'll keep the session open for the duration of the hole job, and create/commit transactions to update the progress. And in the end close the connection and the lock will be released. Why not just update the job as dispatched when it is being worked on and then completed when finished? You still would need to handle partial completion somehow but this way you don't waste live resources during the long waiting period simply monitoring a lock. Though probably at your volume this is not that big an issue. PostgreSQL proper does not have session timeouts that I am aware of. If a worker fails it should release its connection and all advisory locks would be released... I thought that if a worker failed the session was left open. So maybe i don't need the session-timeout thing. You reference to a connection pooler in the above doesn't make sense to me; you'd need persistent connections for this to work (they can be from a pool but the pool size would have to be 100+). The main decision is whether job process state is part of your data model or simply an artifact. I'd suggest that it should be part of the model so state should be managed directly thorough row-level locking and status fields instead of indirectly through temporary system advisory locks. I maintain job-status and job-progress on the same row. But i also want that users can cancel the job, by updating 'canceled' column. So everytime i report the progress, i also RETURNING canceled to see if the job has been canceled by the user and abort it. If you want to keep track of active workers you should setup some kind of heartbeat update query; and maybe depending on how important this is attach advisory lock info to that heartbeat record so a monitoring process can check both the pulse table and the system advisory lock for a two data point confirmation of activity. Will probably do it like this. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-the-right-tool-queue-using-advisory-locks-long-transactions-tp5801667p5801670.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
Since my alternative is using json, that is heavier (need to store keys in every row) than composite-types. Updating an element on a specific composite_type inside an array of them is done by UPDATE table SET composite[2].x = 24; So last standing question, is it possible to insert an array of composite_types by not specifying all of the columns for each composite_type ? So if i later add other columns to the composite_type, the insert query doesn't break ? Thanks On Mon, Apr 21, 2014 at 1:46 PM, Dorian Hoxha dorian.ho...@gmail.comwrote: Maybe the char array link is wrong ? I don't think an array of arrays is good for my case. I'll probably go for json or separate table since it looks it's not possible to use composite-types. On Mon, Apr 21, 2014 at 4:02 AM, Rob Sargentg robjsarg...@gmail.comwrote: Sorry, I should not have top-posted (Dang iPhone). Continued below: On 04/20/2014 05:54 PM, Dorian Hoxha wrote: Because i always query the whole row, and in the other way(many tables) i will always join + have other indexes. On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.comwrote: Why do you think you need an array of theType v. a dependent table of theType. This tack is of course immune to to most future type changess. Sent from my iPhone Interesting. Of course any decent mapper will return the whole row. And would it be less disk intensive as an array of struct ( where struct is implemented as an array). From other threads [1] [2] I've come to understand the datatype overhead per native type will be applied per type instance per array element. [1] 30K floatshttp://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-td5790562.html [2] char arrayhttp://postgresql.1045698.n5.nabble.com/COPY-v-java-performance-comparison-tc5798389.html
Re: [GENERAL] hstore binary representation of keys
Currently hstore is mongodb. It writes the keys everytime (and values as strings!, its mostly for dynamic keys or very sparse keys in my opinion). You can shorten keys,or put them in dedicated columns. I haven't read that there is a plan to compress the strings. On Tue, Apr 22, 2014 at 2:01 PM, Tim Kane tim.k...@gmail.com wrote: Hi all, I’ve been using hstore to record various key/val pairs, but I’ve noticed it consumes a lot more disk than I would have expected. I don’t have any hard figures to illustrate, but empirical testing has shown that if I record these pairs as traditional column based fields, I can save a significant amount of disk. What I think I’m seeing here, is that the hstore representation needs to store the entirety of the *key* alongside each *value*. Let’s say I have a table of 1000 records, and 900 of them have a key named ‘A_REALLY_REALLY_REALLY_LONG_KEY’, then this key will be written do disk 900 times, along with the appropriate values. I guess there are two options open to me here. 1. I could transpose these values into a dedicated field 2. I could use shorter key names Does hstore2 go any way to improving this situation? Some kind of enumerated key based system? Cheers, TIm
Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
Maybe the char array link is wrong ? I don't think an array of arrays is good for my case. I'll probably go for json or separate table since it looks it's not possible to use composite-types. On Mon, Apr 21, 2014 at 4:02 AM, Rob Sargentg robjsarg...@gmail.com wrote: Sorry, I should not have top-posted (Dang iPhone). Continued below: On 04/20/2014 05:54 PM, Dorian Hoxha wrote: Because i always query the whole row, and in the other way(many tables) i will always join + have other indexes. On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.comwrote: Why do you think you need an array of theType v. a dependent table of theType. This tack is of course immune to to most future type changess. Sent from my iPhone Interesting. Of course any decent mapper will return the whole row. And would it be less disk intensive as an array of struct ( where struct is implemented as an array). From other threads [1] [2] I've come to understand the datatype overhead per native type will be applied per type instance per array element. [1] 30K floatshttp://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-td5790562.html [2] char arrayhttp://postgresql.1045698.n5.nabble.com/COPY-v-java-performance-comparison-tc5798389.html
[GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
Hi list, I have a create type thetype(width integer, height integer); create table mytable(thetype thetype[]); How can i make an insert statement so if i later add fields to the composite type, the code/query doesn't break ? Maybe by specifying the fields of the composite type in the query ? This can be done for normal inserts(non arrays): CREATE TABLE mytable (t thetype); INSERT INTO mytable(t.width, t.height) VALUES (11,22); Also how to update an whole element of an array of composites ? Also, how to update an attribute in a specific element in an array of composites? (so when i add columns later to the composite, my old code doesn't break) How much overhead have the composite types beside the values and nulls? Thanks
Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
Was just curious about the overhead. I know the columns, but i may need to add other columns in the future. Yeah, json is the alternative if this doesn't work. On Sun, Apr 20, 2014 at 7:54 PM, Fede Martinez federicoemarti...@gmail.comwrote: If you don't know the columns your type will have, you could consider using json or hstore if the data is unstructured. El 20/04/2014 14:04, Dorian Hoxha dorian.ho...@gmail.com escribió: Hi list, I have a create type thetype(width integer, height integer); create table mytable(thetype thetype[]); How can i make an insert statement so if i later add fields to the composite type, the code/query doesn't break ? Maybe by specifying the fields of the composite type in the query ? This can be done for normal inserts(non arrays): CREATE TABLE mytable (t thetype); INSERT INTO mytable(t.width, t.height) VALUES (11,22); Also how to update an whole element of an array of composites ? Also, how to update an attribute in a specific element in an array of composites? (so when i add columns later to the composite, my old code doesn't break) How much overhead have the composite types beside the values and nulls? Thanks
Re: [GENERAL] Altering array(composite-types) without breaking code when inserting them and similar questions
Because i always query the whole row, and in the other way(many tables) i will always join + have other indexes. On Sun, Apr 20, 2014 at 8:56 PM, Rob Sargent robjsarg...@gmail.com wrote: Why do you think you need an array of theType v. a dependent table of theType. This tack is of course immune to to most future type changess. Sent from my iPhone On Apr 20, 2014, at 11:57 AM, Dorian Hoxha dorian.ho...@gmail.com wrote: Was just curious about the overhead. I know the columns, but i may need to add other columns in the future. Yeah, json is the alternative if this doesn't work. On Sun, Apr 20, 2014 at 7:54 PM, Fede Martinez federicoemarti...@gmail.com wrote: If you don't know the columns your type will have, you could consider using json or hstore if the data is unstructured. El 20/04/2014 14:04, Dorian Hoxha dorian.ho...@gmail.com escribió: Hi list, I have a create type thetype(width integer, height integer); create table mytable(thetype thetype[]); How can i make an insert statement so if i later add fields to the composite type, the code/query doesn't break ? Maybe by specifying the fields of the composite type in the query ? This can be done for normal inserts(non arrays): CREATE TABLE mytable (t thetype); INSERT INTO mytable(t.width, t.height) VALUES (11,22); Also how to update an whole element of an array of composites ? Also, how to update an attribute in a specific element in an array of composites? (so when i add columns later to the composite, my old code doesn't break) How much overhead have the composite types beside the values and nulls? Thanks
Re: [GENERAL] Could use some advice on search architecture
Postgresql has 2 column store, 1-in memory(cant remember the name) and http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analytics On Sat, Apr 19, 2014 at 2:10 PM, Robin robin...@live.co.uk wrote: bottom post On 19/04/2014 12:46, R. Pasch wrote: On 19-4-2014 9:38, Robin wrote: Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics. Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques So consider some binary data representation Red - 1 ( 0001) Orange - 2 ( 0010) Yellow - 4 ( 0100) Green - 8 ( 1000) Blue - 16 (0001 ) Indigo - 32 (0010 ) Violet - 64 (0100 ) This way, you can encode several colours in 1 value Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001 Robin I stopped reading when I heard the word sold by SAP ;-) This project is solely build with open-source and freely available software. I've been thinking about using a binary data representation but didn't come to a solution to this specific problem quite yet. Per property of a product, only one bit would be 1 and the rest would be 0. What would a query look like to match all products that have a bit in the correct position? Say for instance these are a couple records (and yes, property values can be null as well) title, property1, property2, property3 product1, 0001, 0010, NULL product2, 0100, 0100 , 0010 product3, 0010 , 0010 , 0100 Say that I would like to retrieve the products that either have property1 as 0010 , 1000 000 or 0001. Combined that would be 0010 1001 and would have to match product1 and product3 as they both have their individual bit matching one of the bits being asked for. What would a where statement look like using this type of binary representation? If that would be fairly simple to do and fast (most important factor) then I could do an OR construction on all property columns and have something count the amount of properties that actually matched. Is that something you can do with a binary operator of some sort as well? Count the amount of overlapping bits? Say for instance I have a binary value of 0110 0101 and another binary value of 1100 0100, how could I found out how many bits matched? (in this case the number of matching bits would be 2) I understand the reluctance to pay SAP-style rates, as a longtime DB user, I have learned some 'charm' techniques. However, I poked around a bit for alternatives, as I do like the column-oriented approach, and found something called - *MonetDB *http://www.monetdb.com/Home/Features- it apparently has a column-store db kernel, and is open source - I suggest you have a look, if it does what it says on the label, then it looks like a find. There is a discussion of bitmask-trickiness here also dealing with colourshttp://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html Robin http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html
Re: [GENERAL] Pagination count strategies
Cache the total ? On Thu, Apr 3, 2014 at 3:34 PM, Leonardo M. Ramé l.r...@griensu.com wrote: Hi, in one of our systems, we added a kind of pagination feature, that shows N records of Total records. To do this, we added a count(*) over() as Total field in our queries in replacement of doing two queries, one for fetching the records, and other for getting the count. This improved the performance, but we are't happy with the results yet, by removing the count, the query takes 200ms vs 2000ms with it. We are thinking of removing the count, but if we do that, the system will lack an interesting feature. What strategy for showing the total number of records returned do you recommend?. Regards, -- Leonardo M. Ramé Medical IT - Griensu S.A. Av. Colón 636 - Piso 8 Of. A X5000EPT -- Córdoba Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19 Cel.: +54 9 (011) 40871877 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] New OpenSource columnstore from CitusData for Postgresql
Link to hackernews which also has some comments from the devs https://news.ycombinator.com/item?id=7523950 Very interesting: They use foreign data tables as an abstraction to separate the storage layer from the rest of the database.
Re: [GENERAL] Complex sql, limit-for-each group by, arrays, updates
I'll probably go by using 3 queries and putting them in a transaction. Thanks On Wed, Nov 27, 2013 at 5:38 PM, David Johnston pol...@yahoo.com wrote: Dorian Hoxha wrote Hi, So i have (table where data will be read) : CREATE TABLE data (vid,cid,pid,number); Tables where data will be writen/updated: CREATE TABLE pid_top_vids (pid, vid[]) CREATE TABLE pid_top_cids (pid, cid[]) CREATE TABLE cid_top_vids (cid, vid[]) I need to , possibly in 1 query, this will run once in a while: Get top(10) vids , sorted by 'number',grouped by pid and update the row in TABLE(pid_top_vids). Get top(10) vids, sorted by 'number', grouped by cid and update the row in TABLE(cid_top_vids). Get top(10) cids, sorted by 'number', where number is the SUM() of each vid GROUP_BY(cid) and update the row in TABLE (cid_top_vids); So, get data, create sorted array, and update the rows (they exist, so only the arrays have to be updated). Possible ? Thanks I would not bother trying to do it in a single SQL statement but each query is possible, and fairly basic, using UPDATE with a FROM clause containing the desired sub-query. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Complex-sql-limit-for-each-group-by-arrays-updates-tp5780554p5780599.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Complex sql, limit-for-each group by, arrays, updates
Hi, So i have (table where data will be read) : CREATE TABLE data (vid,cid,pid,number); Tables where data will be writen/updated: CREATE TABLE pid_top_vids (pid, vid[]) CREATE TABLE pid_top_cids (pid, cid[]) CREATE TABLE cid_top_vids (cid, vid[]) I need to , possibly in 1 query, this will run once in a while: Get top(10) vids , sorted by 'number',grouped by pid and update the row in TABLE(pid_top_vids). Get top(10) vids, sorted by 'number', grouped by cid and update the row in TABLE(cid_top_vids). Get top(10) cids, sorted by 'number', where number is the SUM() of each vid GROUP_BY(cid) and update the row in TABLE (cid_top_vids); So, get data, create sorted array, and update the rows (they exist, so only the arrays have to be updated). Possible ? Thanks
[GENERAL] Multiple aggs,sums in 1 query
I have: create table tbl (a,b,c,d,e,f,g,h); And i need to select in 1 query ,or the most performant way: top 5(a) top 5(b) top 5(c): for each top5(c): top 5(d) count(f) GROUP BY f I can make these in separate queries but that means that postgresql would read the table multiple-times? Is it possible/doable/sane ? Thanks
[GENERAL] Return only non-null columns
Is it possible to: SELECT * FROM table But to return only non-null columns ? Since i use psycopg2 with DictCursor (a hashtable) it's better for me when i don't have the column that to have it as NULL. Thanks