Re: [GENERAL] looking for a globally unique row ID
On Sep 14, Rafal Pietrak modulated: > My main objective is the "guarantee". Which I've tried to indicate > referring to "future UPDATEs". > With a well-behaved application, it is sufficient to define each ID column as: id int8 UNIQUE NOT NULL DEFAULT nextval('sharedsequence') and ensure that you always use defaults for INSERT and never set the column during UPDATE. You can also use a BEFORE/FOR EACH ROW trigger to enforce these conventions, if you are worried about accidental violations in your SQL. In that case, leave the DEFAULT as NULL in the table definition, but make the trigger do this for INSERT: NEW.id := nextval('sharedsequence'); and this enforcement check for UPDATE: IF OLD.id != NEW.id THEN RAISE EXCEPTION 'id is immutable'; If that's not a strong enough guarantee, I'm not sure I understand your problem statement... Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a globally unique row ID
On Sep 14, vinny modulated: > If it is only one database, on one server, then couldn't you just > use one sequence? > If oyu prefix the value with some identifier of the current table > then you cannot get duplicates > across tables even if you reset the sequence. > I didn't follow the whole thread, so I apologize if I'm repeating earlier suggestions. We use a shared sequence to issue the new identifiers, and in fact limited the sequence to 43 bits so they can serialize as JSON numbers for the benefit of our clients. We disabled wrapping, so it will fail if we exhaust the range. If you rapidly churn through identifiers and could envision exhausting 64 bits in your database's lifetime, you should probably just use UUIDs instead of a sequence. A timestamp-based UUID still has reasonably sorting and indexing properties. To "guarantee" uniqueness with a shared sequence or UUID generator, you can use a trigger to prevent override of identifiers from SQL. As long as you always use the correct value generator during INSERT and disallow mutation of identifiers during UPDATE, the rows will not share identifiers. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL with PowerBuilder, and Identity keys (serials)
> Select currval('GEN_') > >From the above, I am assuming you did something like: CREATE SEQUENCE "GEN_" ...; and are trying to access this sequence? If so, you actually have to include the SQL quoted identifier syntax within the text argument to currval() or nextval(), e.g. SELECT nextval('"GEN_"'); With these sorts of identifier-as-argument parameters in Postgres, you can also include schema-qualification syntax: SELECT nextval('"My Schema"."GEN_"'); Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb case insensitive search
On Jun 01, armand pirvu modulated: > Overall could it be that the optimizer blatantly ignores a scan index which > is cheaper than a table scan, or jsonb implementation still has a long way to > come up or the way it is used in my case is not the one designed for ? > If I remember correctly, isn't a compound index always just using btree? In general, I have found better luck using several smaller btree indices than one large compound one. Unless your entire query can be answered from an index-only lookup, the extra columns just bloat the btree index. So, you might as well use a simpler compound index for the regular scalar row keys, and this index will be much smaller without the baggage of the jsonb values at its leaves. The planner can use the jsonb from the actual candidate rows if it is going to have to visit them anyway for other WHERE or SELECT clauses. If the sparseness of your query is due to the content within the jsonb values rather than the other scalar row keys, I think you'd need some kind of GIN index over the contents of the jsonb documents to find the small subset of candidate rows by these sparse criteria. Trigram is just one example of a GIN indexing scheme. If your jsonb documents are "flat", i.e. just a bag of key value pairs and not arbitrary nested jsonb structures, you might also explode them into arrays of keys or values as separate indexed expressions? Then, you could GIN index the arrays and quickly find the subset of rows with certain unusual keys or unusual values, but would still have to follow up with a more exact check for the combination of key and value. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb case insensitive search
On May 31, armand pirvu modulated: > The idea is that I would like to avoid having an index for each key > possibly wanted to search, but rather say have the whole record then > search by the key and get the key value, thus having one index serving > multiple purposes so to speak > First, benchmarking would be important to figure out if any proposed indexing actually speeds up the kinds of queries you want to perform. With the recently added parallel query features, a simpler indexing scheme with some brute-force search might be adequate? But, you could use a search idiom like this: (lower(json_column::text)::json) -> lower('key') = 'value'::json This will down-convert the case on all values and keys. The left-hand parenthetic expression could be precomputed in an expression index to avoid repeated case conversion. But, typical searches will still have to scan the whole index to perform the projection and match the final value tests on the right-hand side. If you want to do things like substring matching on field values, you might stick with text and using regexp matches: (lower(json_column::text)) ~ 'valuepattern' or more structural searches: (lower(json_column::text)) ~ '"key": "[^"]*substring[^"]*"' Here, the left-hand expression could be trigram indexed to help with sparse, substring matching without a full index scan. We've had good luck using trigram indexing with regexp matching, though I've honestly never used it for the purpose sketched above... Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] import CSV file to a table
On Mar 08, Rob Sargent modulated: > Yes Karl, I agree. I admitted as much. But if it's clean, as in > free of quoted commas, life is much more simple. I've lost site of > whether or not the OP knows his situation w.r.t. to this. The awk > line will tell him and for a one-off load this can make a world of > difference in complexity - two bash lines and a COPY. > Maybe I didn't understand your awk? I thought it was counting commas in lines. This isn't the same as counting commas in records. this,is,record,one "this,,","is ,,record","two ,,," this has three commas on each line and definitely is not suitable for naive CSV handling. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] import CSV file to a table
I believe that in its fully glory, you cannot reliably locate CSV record boundaries except by parsing each field in order including quote processing. Individual records may have arbitrary numbers of field and record separator characters within the values. Karl On Mar 08, Rob Sargent modulated: > Since bash has been bandied about in this thread I presume awk is > available. Here's how I would check just how 'csv'ish the incoming > file is. > ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] import CSV file to a table
On Mar 08, John McKown modulated: ... > I agree. I went with a "pure BASH" approach because it is what the > user asked for & I wasn't sure what language she might be comfortable > with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot. > Such as a PERL script with writes out another PERL script, based on > some input files & parameters, then runs the just written PERL script, > which does the load into a PostgreSQL database (multiple tables). Ya, a > bit perverted. > Well, you could follow a similar meta-programming/code-generating pattern to have the BASH script output a single SQL file to run with psql. You could even generate PL/pgsql code to defer more data processing to the database itself. I think the only robust "pure BASH" approach is to use a temporary table, so you aren't trying to parse CSV content in BASH. Using csvkit sounds good if you can introduce these third-party dependencies. With the temporary table, you can use SQL for most validation or data interrogation, but you need to know at least enough schema information in advance to form the COPY statement. Parsing the CSV header row to plan your work puts you right back to requiring a robust CSV parser unless you can constrain your input scenarios to only handle very trivial headers. If you play games with a defaulting serial column and fixed column names like "id, c1, c2, ..., cN" for the temporary table, you might use the id column as a DB-assigned "row number" during COPY and validation. In this case, you could even tell Postgres there is no header, and then let it parse the header as another data record so you can use SQL statements to determine the actual header names and ordering in the input. But this still requires knowing the column count in advance of the COPY. I also think using something like Python with structured data processing would be wiser, unless you know enough about the schema in advance to avoid any CSV parsing on the client side. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Are new connection/security features in order, given connection pooling?
On Jan 14, Tomas Vondra modulated: ... > Sure, a lot of systems generate queries on the fly. Also, if the > main problem is poor plan choice due to RLS, I'm not sure how > prepared statements could help with that. > Sorry for being unclear, I was trying to respond to too many sub-topics at once. The performance comment was just a tangential aside from my experiments using RLS for web client authorization. Indeed, I use the current_setting() function in RLS policy statements to access ephemeral web client context stored via set_config(). In the rest of my message, I was trying to respond to the topic of being able to "set role" and limit the connection privileges to reflect a web client's identity for RLS or other fine-grained authorization. I thought this was the main topic of the thread---the desire to push down web client authorization decisions into the normal postgres mechanisms, and all the problems arising from this sort of approach. The concern was raised about the risk of the subsequent queries being able to reverse the "set role" to subvert such authorization. Others in the thread provided the common advice of parametric queries to prevent query injection, but I think you need more than that. I think one would like a stronger isolation between the connection setup/security config layer and the fancier (and possibly flawed) application layer. We struggle with this because all these mechanisms are multiplexed as commands in one SQL connection, instead of having an in-band versus out-of-band protocol layering available to the web application, so the more risky data-dependent SQL can flow in-band while the basic request-handler lifecycle could configure the security context out-of-band. > That might be an interesting feature, but it's also significantly > more complex than the topic of implementing a safe context for > secrets, making RLS less problematic with connection pools. > I was thinking that this mechanism would support the usual shared connection pool to support all web users. The web server would repeatedly redefine the restricted security context and execute application queries under that context. The pooling problems come from the suggestion to open connections w/ different per-user authentication and then not have any user-switching that can easily reconfigure the connection, right? karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Are new connection/security features in order, given connection pooling?
I can relate to the original plea from my own exploration of this topic. Before I get into that, I will mention as an aside that to date we have found RLS to be really slow for web client authorization, given that we have to use session parameters to store web client context and there doesn't seem to be a way to get the query planner to treat these settings as constants nor to lift static parts of RLS policy out of per-row loops. It's so slow that I wonder whether future releases will improve it before we are forced to move authorization back into the client-side application logic, making our generated SQL much more complex but better tuned to the query planner's capabilities. As background, our web applications essentially compile new SQL queries on the fly and send them through to PostgreSQL. There is a lot of metaprogramming involved in generating SQL statements to embody the work we want done on behalf of our clients, and this doesn't fit well into traditional prepared statements etc. The web service needs to assert the client context and the client-level statements we generate and execute should not be able to override this. The service logic to determine and assert client context (once per web request) is dramatically simpler than the service logic producing all the client-level SQL statements, and it would be nice to have a restricted execution context to isolate the two kinds. We also like the idea that policy enforcement mechanisms could be applied orthogonally to the generated client-level SQL statements, as it feels like a better defense-in-depth architecture and is also easier to reason about. To do so in our service logic would mean replicating more and more of the query parsing and engine to do general query rewriting. So, it would be nice to have a tunneling mechanism where I can distinguish the control-plane operations I am performing from the application operations I am translating and pushing down to PostgreSQL on behalf of the web client. In my case, I might want to perform a mixture of service-level and client-level statements within the same transaction. The client-level statements cannot control transactions. I could imagine something like a two-level feature set. At the top-level in the connection, we can statefully manipulate our security contexts, set configurable privilege masks for the second-level execution context, set effective roles (reversibly, as today), manage transactions, etc. With some kind of nested execution block, we could submit less trusted statements to run within the second-level execution context: EXECUTE RESTRICTED $guard$ app_query... $guard$ ; This would take the guarded query string, restart a safe parser on it, and only on successful parse go forward with planning and executing it in the restricted mode that has been configured for the connection. Eventually, I could see wanting more resource management controls on this restricted context too, i.e. setting limits of CPU/IO/RAM consumption or execution time. Both planning time limits (reject based on estimates) and runtime (abort query if limit is reached). Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] resolution order for foreign key actions?
On Nov 08, David G. Johnston modulated: ... > ON DELETE starts with the "one" side of a one-to-many relationship. > When deleting the one row the question is what should be done with the > many rows which refer to it. If ALL of the many rows agree to be > deleted then the one row in question can go away and no error is > raised. If ANY of the many rows refuse to die then the one row in > question must remain in order to maintain referential integrity - thus > an error will be raised. > I think I had the same intuition going into this. However, I am testing with an artificial scenario to focus on the ordering/precedence behavior. I was even hoping PostgreSQL might raise an error when I created apparently conflicting constraints, but unfortunately it does something much more confusing... You might consider this to explore what happens if someone accidentally redefines constraints with conflicting actions. I just redefine the same constraint with only varying constraint name and ON DELETE clause. Here, I have assigned constraint names to demonstrate that the rules are NOT applied based on a lexicographic sort of constraint names but rather on order of definition (perhaps there is another internally-generated name that sorts in order of definition?): == ALTER TABLE refs ADD CONSTRAINT z FOREIGN KEY (t_id) REFERENCES targets (id) ON DELETE NO ACTION; ALTER TABLE ALTER TABLE refs ADD CONSTRAINT y FOREIGN KEY (t_id) REFERENCES targets (id) ON DELETE SET NULL; ALTER TABLE ALTER TABLE refs ADD CONSTRAINT x FOREIGN KEY (t_id) REFERENCES targets (id) ON DELETE CASCADE; ALTER TABLE This test will show that ON DELETE NO ACTION is in effect due to constraint z. Table "public.refs" Column | Type | Modifiers +-+--- id | integer | not null default nextval('refs_id_seq'::regclass) t_id | integer | not null Indexes: "refs_pkey" PRIMARY KEY, btree (id) Foreign-key constraints: "x" FOREIGN KEY (t_id) REFERENCES targets(id) ON DELETE CASCADE "y" FOREIGN KEY (t_id) REFERENCES targets(id) ON DELETE SET NULL "z" FOREIGN KEY (t_id) REFERENCES targets(id) - Content of refs table before deletion of target: SELECT * FROM refs; id | t_id +-- 1 |1 2 |2 3 |3 (3 rows) - Attempting to delete a target: DELETE FROM targets WHERE name = 'foo1' RETURNING *; ERROR: update or delete on table "targets" violates foreign key constraint "z" on table "refs" DETAIL: Key (id)=(2) is still referenced from table "refs". The attached BASH script will perform a sequence of tests defining the constraints in different orders and showing the results. The excerpt above is from the first test scenario. It accepts optional arguments which are passed to 'psql' and can run with no arguments if you can talk to your default DB with 'psql' absent of any arguments, i.e. with Unix domain socket authentication. It only creates and destroys tables public.targets and public.refs... Thanks, Karl fkr-test.sh Description: Bourne shell script -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] resolution order for foreign key actions?
Hi, Is there a formal definition for the order in which constraint actions (i.e. the ON DELETE or ON UPDATE rules) are applied when there are multiple overlapping/relevant constraints? I have struggled to find an answer in the manual, but my experiments suggest that they are interpreted in the order in which the constraints were defined and the first rule in this order is applied while subsequent rules are ignored. This can be very confusing if one rule says CASCADE and another NO ACTION, and you need to understand this order of definition to know whether a delete will cascade or raise an error. Is there a definitive way to introspect the informatation_schema or pg_catalog to determine which behaviors will effectively apply to a given "DELETE FROM ..." or "UPDATE ..." statement? Thanks, Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] parallel query and row-level security?
On Oct 27, Karl Czajkowski modulated: > Hi, I am struggling to find information on how the new parallel query > features intersect with row-level security. I am wondering whether > this is expected to work at this time? > Further experimentation shows that I can get parallel plans with row-level security, if I remove our ugly policy with a scalar subquery and instead denormalize the decision into another column of the same table (i.e. one managed by a trigger). However, our policies are still adding huge overhead. If I lie and say our stored procedure is IMMUTABLE rather than STABLE, the optimizer lifts our first (row-independent) policy check out of the per-row filter and the query is 10x faster. But it seems to execute it per-row when it is marked as STABLE, even though the function takes no arguments and so does not vary by row. But, our procedure calls current_setting which has STABLE volatility. Does the session/connection cache plans? Would declaring our function as IMMUTABLE accidentally allow reuse of (stale) plans across boundaries where we have modified the session parameter? My conservative reading of the docs suggests it might, which would make this cheat unsafe for us... Thanks, Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] parallel query and row-level security?
Hi, I am struggling to find information on how the new parallel query features intersect with row-level security. I am wondering whether this is expected to work at this time? My cursory experiments indicate that we can get a parallel plan when querying as the table owner or superuser but not when querying as a role that is subject to the row-level security policies on the same table. Our policies on the table are something like: CREATE POLICY "p1" ON schema1.table1 FOR SELECT USING (ARRAY['x', 'y'] && schema1.our_procedure()); CREATE POLICY "p2" ON schema1.table1 FOR SELECT USING ((SELECT True FROM schema2.foo_authz p WHERE p.sec_class = table1.sec_class LIMIT 1)); We've defined our_procedure as STABLE PARALLEL SAFE using plpgsql and it makes a call to current_setting(). Should we ever expect to see parallel query with policies such as these? Are there any documented limits where we might adjust our policy and gain parallel plans? Thanks, Karl p.s. In case you are wondering about these odd policies, we are using them to inject web client authorization decisions into our data access layer... We are running under a single application postgresql role and our app passes web client authentication attributes as a session parameter which we look up with current_setting() and convert back into an array to intersect with the ACL content ['x', 'y'] in the first policy above. The second policy permits access to rows in a certain security class managed by dynamic content in another mapping table (which of course has limits as to how the web application will modify it). This allows some subset of rows to be exposed to less trusted clients, while the first policy reveals all rows to a more trusted group of clients. We also have some tables where the ACL is stored in a column and intersected for each row, but that doesn't exist in this first case where I tried to get a parallel plan. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is there a way to fix this ugliness
On Sep 10, Tim Uckun modulated: > I am trying to get the child elements of a one to many table to be > rolled up into a json field in the parent table. The query I am running > is... The problem is aggregating over the results of the left-outer join, which introduces NULLs. You can try pushing that down into a sub-query to create one image row per observation prior to joining: SELECT ob.id, im.images FROM observations ob LEFT OUTER JOIN ( SELECT observation_id, json_agg(row_to_json(im.*)) AS images FROM images im GROUP BY observation_id ) im ON (ob.id = im.observation_id) ; you might use COALESCE in the top-level SELECT if you want to replace any NULL im.images with a different empty value constant... Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Memory usage per session
On Jul 08, John McKown modulated: ... > I think the "problem" that he is having is fixable only by changing how > PostgreSQL itself works. His problem is a PL/pgSQL function which is > 11K lines in length. When invoked, this function is "compiled" into a > large tokenized parse tree. This parse tree is only usable in the > session which invoked the the function. Apparently this parse tree > takes a lot of memory. Right. I'm not entirely sure the original poster wants to hear practical solutions, but I think there are three ways that someone familar with Postgres would avoid this type of problem: 1. Connection pooling. Set the max connection limit and other tuning parameters appropriately for your workload and available resources. Don't get into a state as was described (thousands of open connections and only hundreds "active"). Make your client applications more disciplined. 2. Buy more RAM. You can easily grow to 512GB in just basic dual-socket servers these days. This hardware cost may well be worth it to avoid human labor costs. 3. Rewrite or refactor such complex stored procedures in a different programming language such as C or Python, so your PL/pgsql stored procedures remain small glue around libraries of code. Postgres makes it very trivial to extend the system with such procedural libraries. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query when the select list is big
On May 09, Sterpu Victor modulated: > I tested it now, EXPLAIN ANALYZE is not showing LEFT JOIN at all if > I don't select from the joined tables. > Now is clear why the query is so mutch more efficient when I select > less data. > > Thank you > With so many joins, you may want to experiment with postgresql parameter tuning. These parameters in particular can have a significant impact on the plan choice and execution time: work_mem effective_cache_size from_collapse_limit join_collapse_limit geqo_threshold geqo_effort Setting these to appropriately large values can make analytic queries run much faster. Of course, setting them too high can also make for very bad plans which cause the DB server to over subscribe its memory and start swapping... it requires a bit of reading and a bit of experimentation to find ideal settings for your environment. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow join over three tables
On Apr 27, Tim van der Linden modulated: ... > I'm joining three fairly large tables together, and it is slow. The tables > are: > > - "reports": 6 million rows > - "report_drugs": 20 million rows > - "report_adverses": 20 million rows > ... > All tables have indexes on the "id"/"rid" columns and on the "drug"/"adverse" > columns. > > The query: > > SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug > FROM reports r > JOIN report_drugs d ON d.rid = r.id > JOIN report_adverses a ON a.rid = r.id > WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back > pain', 'back pain']) > AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created; > I would suggest a few experiments to see how you can modify the plans available to the optimizer: 1. CREATE INDEX ON report_drugs (drug, rid) 2. CREATE INDEX ON report_adverses (adverse, rid) 3. CREATE INDEX ON report (id, created) Re-run EXPLAIN ANALYZE of your query after each of these steps to see how each one affects planning. You might also try two variants of the query at each step, with and without the ORDER BY. Note, the index column order in (1) and (2) above is very important. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] RLS policy dump/restore failure due to elided type-casts
On Apr 20, Tom Lane modulated: > Karl Czajkowski <kar...@isi.edu> writes: > > CREATE POLICY delete_stuff ON stuff > > FOR DELETE USING ('example attribute value' = ANY ( ((SELECT > > current_attributes()))::text[] )); > > Just out of curiosity, why are you doing it like that, and not simply > > USING ('example attribute value' = ANY (current_attributes())) > > It seems like you're going out of your way to complicate matters. > > regards, tom lane > Going out of my way to complicate matters is my specialty. :-) I spend more of my time writing programs that generate and transform SQL from domain-specific languages, rather than writing SQL by hand. In this case, I think I composed the subquery reflexively without thinking that there is a more direct idiom for this case... Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] RLS policy dump/restore failure due to elided type-casts
Our row level policies require very explicit casts in order to be accepted by the DB, but those casts are discarded in the dumped policy statements. Thus, an attempt to reload the dump file fails. I'm not sure if the underlying problem is that the cast shouldn't be required in the first place, or if the normalization applied to the policy expression is just incorrect. Below is a trivialized example that shows the problem while removing complexity found in our real policies. We are doing this to implement differentiated policies in a web application, where the web server stores authenticated web client context into session parameters and our policies check against those attributes rather than DB roles. To work around this, we have to maintain our policies in a separate SQL file, manually edit the dumps, and reapply our working policies. This is obviously undesirable in the long run, where dumps taken as periodic backups are not directly usable to reconstruct the DB... Karl Create a test database with these contents: CREATE FUNCTION current_attributes() RETURNS text[] STABLE AS $$ BEGIN RETURN ( SELECT array_agg(value) FROM json_array_elements_text(current_setting('ourapp.attributes')::json) ); EXCEPTION WHEN OTHERS THEN RETURN NULL::text[]; END; $$ LANGUAGE plpgsql; CREATE TABLE stuff ( value text PRIMARY KEY ); CREATE POLICY delete_stuff ON stuff FOR DELETE USING ('example attribute value' = ANY ( ((SELECT current_attributes()))::text[] )); The output of pg_dump (and similarly the '\d' command in psql) drops the cast: CREATE POLICY delete_stuff ON stuff FOR DELETE TO PUBLIC USING (('example attribute value'::text = ANY (( SELECT current_attributes() AS current_attributes; And this causes an error when executing the dump file on a new database: ERROR: operator does not exist: text = text[] HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] row-level security, views and materialized views?
Hi, has there been any discussion as to supporting row-level security policies on materialized views? I didn't have much luck searching on this topic. I see that PostgreSQL 9.5 refuses to create a policy on a materialized view, consistent with the documentation focusing only on tables. First off, am I correct in understanding that access to regular views will experience the same row-level security as if the client posed the view-definition SQL directly in their query? I.e. the view is expanded like a macro, and the row-level security on constituent tables applied during the query evaluation with respect to the consuming client security context? In an ideal but impractical world, I would use materialized views as strictly a performance optimization, while getting the same row-level security as described above. However, I suspect in practice that the security context of the client who defines or refreshes the view will be applied instead, while the view is being materialized. Is that also correct? So, in a more practical but secure world, I suppose I would want the materialized view to work the same as "CREATE TABLE AS ..." so that at least I could retain security properties from source tables in my transformed output, and define moral equivalent row-level security policies on my materialized view. Then, other clients querying my extract could have the same security applied as if they queried the source tables, so I don't necessarily leak rows to clients less trusted than myself. Thanks for any insights, Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Check old and new tuple in row-level policy?
On Dec 18, Stephen Frost modulated: > I agree that it would be a nice addition, as I've said before. OK, sorry I misunderstood earlier and thought you were dismissing the idea as redundant with triggers. Thanks for your patience! > ... We certainly won't be adding it into 9.5 and it's getting pretty > late for 9.6 too, but I'm anxious to see just how RLS is used in the > field If we want to start testing with 9.5 and emulate an environment supporting NEW and OLD in the CHECK policy for UPDATE, can you recommend how we should do that? E.g., is there a particular trigger idiom that would most closely replicate the RLS extension we've been discussing (so that we'd know that a working policy set + triggers could be translated to just RLS policies if that feature is added)? We'd be happy to experiment with these sorts of policies in our applications. My intuition is that with this extension, we'd be able to push down nearly all of our important policy enforcement into PostgreSQL and guard against a slew of potential application programming errors (compared to handling all this enforcement in our application data access code)! Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Check old and new tuple in row-level policy?
On Dec 18, Stephen Frost modulated: > Any UPDATE which requires SELECT rights on the table will require expr1 > to pass AND expr4 (the UPDATE's USING clause) to pass. This is modeled > directly off of our existing GRANT/ACL system. The same is true for the > other commands. Note that we explicitly want an independent USING > clause for expr4 as you may wish to reduce the set of rows which may be > UPDATE'd to be less than the set which are visible via SELECT. > > I anticipate adding the ability to have "restrictive" policies also, in > the future. > Right, I think I understand that. However, my argument was (and remains) that I think the update conditions need to be able to access OLD and NEW row values to decide whether the existing row security context allows the new update content. This update decision is inherently different from select, insert, and delete decisions. I don't think that two separate decisions are sufficient: 1. OLD row can be updated (the USING condition?) 2. NEW row is acceptable (the CHECK condition?) when considering the row lifecycle as having continuity of identity and security context. I think that the second decision needs to have the option to compare OLD and NEW to decide that the new row is an acceptable transform of the existing row, preserving whatever identity and/or security context is important in a particular system of policies. As I understand this discussion, you are telling me to just use an update trigger for this. I can understand that as a workaround given the current row-security options, but I do think that this kind of decision is essential to row-security in realistically complex applications that allow row mutation. I don't think it is particularly esoteric to suggest that the existing row security context should limit possible future configurations of the row by different parties. That effect spans security context, row identities, and regular non-identifying content. Right now, it seems like the row-security model assumes the only security context is an "owner" field carrying a user name and that is never mutated (perhaps by a column-level privilege). I think security context can be richer than that, including more abstract roles, classes, or attributes and having users be able to mutate that context but only under the control of row-security policy. It is not as simple as the context being system managed only, or open to arbitrary changes by admins and no change by others. Rather, the current context may grant certain classes of user the ability to make only certain coherent changes to that context. Where mutation is different here is that a set of collaborating parties can interpret one mutable row as a long-lived resource that has an ongoing identity (or set of identities) and a coherent series of row states protected by policy. Without the ability for the OLD row to contrain how the NEW row develops in an update policy, we lose that coherence and have a fugue state for the application, with no ability to trust the identity of content over time. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Check old and new tuple in row-level policy?
I feel that the fundamental question here is about the semantics of "row" in row security. The core question is whether UPDATE has a different semantics with respect to row identity and security context than a sequence of DELETE, INSERT. At the moment, the documented solution feels like "tuple security" and supports an idea of security context encoded into a tuple such that a policy condition can decide whether its context and data configuration is consistent with the session context. This makes sense in a pure relational calculus, but ignores the practical deviation of SQL where there are mutable records. I am hoping for "record security" that supports an idea of security context encoded into a record such that the context in the existing record can constrain how the update of the record is formed. This includes policies that might differentiate updates to key versus non-key fields in the record, deciding in application terms whether the record has an identity and context that carries forward through the UPDATE or whether it is merely sugar for a DELETE followed by INSERT. Ideally, I'd be able to write a policy that has conditions for each category of operation: POLICY FOR SELECT WITH expr1 POLICY FOR INSERT WITH expr2 POLICY FOR DELETE WITH expr3 POLICY FOR UPDATE WITH expr4 where expr1 would always be used to decide whether the current tuple is visible to the query engine, expr2 would always be used to validate new rows, expr3 would always be used to authorize row deletion, and expr4 would be able to authorize row replacement using NEW and OLD value comparisons. An actual SQL UPDATE for a row visible according to expr1 could be authorized if expr4 allows it *or* if expr3 and expr2 would allow a DELETE followed by INSERT. Where this becomes interesting is when a session context with insufficient privilege to do the DELETE and INSERT sequence is still allowed to do the UPDATE because their request fits the narrower confines of the expr4 policy. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Check old and new tuple in row-level policy?
I've been trying to learn more about the row-security policies but coming up short in my searches. Was there any consideration to allowing access to both old and new row tuples in a POLICY ... CHECK expression? This idiom I've seen in the lower level rule and trigger systems seems like it would allow for much more powerful row-security policies. The simple illustrations of row-security policy always seem to consider an 'owner' field in the tuple compared to current_user. If you could consult both old and new values, you could generalize to storing application ACLs in rows and using those ACLs to decide row access while also ensuring that ACLs cannot be changed in ways inconsistent with the privilege level of the current user. For example, if the current user is in the old ACL value, allow them to modify the ACL otherwise require that the new ACL value be equal to the old ACL value. This would allow a user to be given write access to some columns while restricting others, but on a row-by-row basis. Right now, as I understand it, you can only compare the old values to session state in the WITH condition and new values to the session state in the CHECK condition, but never consider old and new values simultaneously. This excludes a wide and useful gray area between no trust and full trust to amend row content. Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Check old and new tuple in row-level policy?
On Dec 16, Stephen Frost modulated: > There is still a need to refer back to some kind of state that is > external to the table under consideration to determine what the session > level access is, no? Even if the ACLs are in a table somewhere, how do > you know who the current user is? > Yes, I was assuming session variables of some sort. Right now, I am looking to use the row-security policies for a web application. We would just have the application server authenticate as a service role and set session variables holding the web client's authentication context. The WITH and CHECK conditions would compare row content against these session variables to enforce web client authorization decisions underneath our application queries. The row content includes ownership and ACL-like content as well as other application content subject to special access rules expressed in terms of the ownership and ACLs. If we authenticated users to the database, we would want to consult current_user and something like current_roles (an array of all roles granted to the current_user). Instead, we'll be using analogous session context asserted to us by the web service. We've done previous systems where we compile all the application policy checks into the SQL queries generated by the application, but I think it would be more appropriate to split these out and have generalized enforcement at the database level. It feels like a close but not perfect fit already. It's fraught with perile to handle all the data visibility rules while generating any application-level data filtering expressions, joins, etc.! I'm trying to boil out some simple illustrations. Discussing an entire cohesive system is difficult and probably counter-productive... The technical idea is to have policies that consider the relationship between old data, new data, and session context to only allow particular state transitions for row UPDATE. For INSERT, SELECT, and DELETE, I think the current policy model is already sufficient. Just a few possible use cases to illustrate mixed tests of old and new row values: 1. Interlocks between record states and supplemental access rights. A community might not allow records to be marked readable until they have been giving a passing QA grade. A subsequent consumer might revise the to a failing grade, but not revoke the current access rights due to transparency rules. 2. State-transition rules for specific values. Enforce that regular users can only move a workflow state in along normal edges, while an admin user may be able to intervene and make abnormal transitions. Or, allow users to fill in "missing" data such as replacing NULL or other defaults with better values, but only administrators can erase data back to NULL states. 3. Classification systems or other quasi-monotonic permissions models where a user may advance the access class of a record in one direction, but only special administrators can reverse the direction. A. A publishing system might make it easy to draft data in smaller, private groups but once published it is hard to retract things from the public record. B. Confidentiality systems might do the opposite, allowing things to be flagged as sensitive and locked down more easily than relaxing access restrictions. C. Community-based delegation systems might make it easy to "share" records with additional consumers by adding to an ACL but only the more privileged owner of the row can remove entries from the ACL to "unshare". 4. Custody or provenance records. Certain unusual state-transitions of data values may only be allowed if an explantory record is appended to a small log array stored in the row. I think that there is significant overlap between authorization, state transition models, and data integrity constraints once you start considering collaborative applications with mutable records. The next big leap beyond considering NEW and OLD values during condition checks would be to use scalar subqueries to examine the row within the context of other existing rows in the same or different tables. I have not looked to see if this is possible in the current policy system, but I imagine we would try hard to avoid doing this due to performance implications, even if it is allowed... Karl -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general