[
https://issues.apache.org/jira/browse/NIFI-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mgerbig updated NIFI-10685:
---------------------------
Description:
Nifi's PutDatabaseRecord does not quote columns in the conflict clause when
using PostgreSQLDatabaseAdapter. This prevents the usage of upsert (and
possibly insert ignore) statements as soon as at least one column of the
primary key contains at least one uppercase character.
h2. Root Cause
Postgres lowercases all unquoted keys, as described in the manual, which causes
this bug:
{quote}Quoting an identifier also makes it case-sensitive, whereas unquoted
names are always folded to lower case. For example, the identifiers
{{{}FOO{}}}, {{{}foo{}}}, and {{"foo"}} are considered the same by PostgreSQL,
but {{"Foo"}} and {{"FOO"}} are different from these three and each other. (The
folding of unquoted names to lower case in PostgreSQL is incompatible with the
SQL standard, which says that unquoted names should be folded to upper case.
Thus, {{foo}} should be equivalent to {{"FOO"}} not {{"foo"}} according to the
standard. If you want to write portable applications you are advised to always
quote a particular name or never quote it.)
{quote}
see
[https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]
h2. Reproduce behaviour
Create a table in your *PostgreSQL* database
{code:java}
CREATE TABLE test_table
(
"TIME_PERIOD" date NOT NULL,
"R0102" numeric,
"R0203" numeric,
"R0304" numeric,
"R0405" numeric,
"R0506" numeric,
"R0607" numeric,
"R0708" numeric,
"R0809" numeric,
"R0910" numeric,
modified_at_utc date DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP(0)),
CONSTRAINT test_table PRIMARY KEY ("TIME_PERIOD")
) {code}
Use attached avro to write to the table using a PutDatabaseRecord Processor
configured with UPSERT.
Nifi will throw following Exception in the Bulletin Board
{code:java}
PutDatabaseRecord[id=...] Failed to put Records to database for
FlowFile[filename=redacted.avro]. Routing to failure.:
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO
"public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405",
"R0506", "R0607", "R0708", "R0809", "R0910") VALUES ('2022-10-21 +00',
'2.97'::numeric, '3.2'::numeric, '3.19'::numeric, '3.19'::numeric,
'3.22'::numeric, '3.33'::numeric, '3.33'::numeric, '3.4'::numeric,
'3.38'::numeric) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD",
"R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809",
"R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203",
EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607",
EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") was aborted: ERROR:
column "time_period" does not exist
Position: 347 Call getNextException to see other errors in the batch.
- Caused by: org.postgresql.util.PSQLException: ERROR: column "time_period"
does not exist
Position: 347 {code}
h3. Generated SQL Statement
Nifi generates and issues following SQL Statement to the database:
{code:java}
INSERT INTO "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304",
"R0405", "R0506", "R0607", "R0708", "R0809", "R0910") VALUES (?, ?, ?, ?, ?, ?,
?, ?, ?, ?) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD", "R0102",
"R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") =
(EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203", EXCLUDED."R0304",
EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607", EXCLUDED."R0708",
EXCLUDED."R0809", EXCLUDED."R0910") {code}
This statement only works with quoted _TIME_PERIOD_ in the _ON CONFLICT_ clause
due to the behaviour of postgres described in Section _Root Cause_
was:
Nifi's PutDatabaseRecord does not quote columns in the conflict clause when
using PostgreSQLDatabaseAdapter. This prevents the usage of upsert (and
possibly insert ignore) statements as soon as at least one column of the
primary key contains at least one uppercase character.{*}{*}
h2. Root Cause
Postgres lowercases all unquoted keys, as described in the manual, which causes
this bug:
{quote}Quoting an identifier also makes it case-sensitive, whereas unquoted
names are always folded to lower case. For example, the identifiers
{{{}FOO{}}}, {{{}foo{}}}, and {{"foo"}} are considered the same by PostgreSQL,
but {{"Foo"}} and {{"FOO"}} are different from these three and each other. (The
folding of unquoted names to lower case in PostgreSQL is incompatible with the
SQL standard, which says that unquoted names should be folded to upper case.
Thus, {{foo}} should be equivalent to {{"FOO"}} not {{"foo"}} according to the
standard. If you want to write portable applications you are advised to always
quote a particular name or never quote it.)
{quote}
see
[https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]
h2. Reproduce behaviour
Create a table in your *PostgreSQL* database
{code:java}
CREATE TABLE test_table
(
"TIME_PERIOD" date NOT NULL,
"R0102" numeric,
"R0203" numeric,
"R0304" numeric,
"R0405" numeric,
"R0506" numeric,
"R0607" numeric,
"R0708" numeric,
"R0809" numeric,
"R0910" numeric,
modified_at_utc date DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP(0)),
CONSTRAINT test_table PRIMARY KEY ("TIME_PERIOD")
) {code}
Use attached avro to write to the table using a PutDatabaseRecord Processor
configured with UPSERT.
Nifi will throw following Exception in the Bulletin Board
{code:java}
PutDatabaseRecord[id=...] Failed to put Records to database for
FlowFile[filename=redacted.avro]. Routing to failure.:
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO
"public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405",
"R0506", "R0607", "R0708", "R0809", "R0910") VALUES ('2022-10-21 +00',
'2.97'::numeric, '3.2'::numeric, '3.19'::numeric, '3.19'::numeric,
'3.22'::numeric, '3.33'::numeric, '3.33'::numeric, '3.4'::numeric,
'3.38'::numeric) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD",
"R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809",
"R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203",
EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607",
EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") was aborted: ERROR:
column "time_period" does not exist
Position: 347 Call getNextException to see other errors in the batch.
- Caused by: org.postgresql.util.PSQLException: ERROR: column "time_period"
does not exist
Position: 347 {code}
h3. Generated SQL Statement
Nifi generates and issues following SQL Statement to the database:
{code:java}
INSERT INTO "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304",
"R0405", "R0506", "R0607", "R0708", "R0809", "R0910") VALUES (?, ?, ?, ?, ?, ?,
?, ?, ?, ?) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD", "R0102",
"R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") =
(EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203", EXCLUDED."R0304",
EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607", EXCLUDED."R0708",
EXCLUDED."R0809", EXCLUDED."R0910") {code}
This statement only works with quoted _TIME_PERIOD_ in the _ON CONFLICT_ clause
due to the behaviour of postgres described in Section _Root Cause_
> PutDatabaseRecord Upsert does not quote columns in conflict clause (postgres)
> -----------------------------------------------------------------------------
>
> Key: NIFI-10685
> URL: https://issues.apache.org/jira/browse/NIFI-10685
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.18.0
> Reporter: mgerbig
> Priority: Major
> Attachments: putdatabase-test.avro
>
>
> Nifi's PutDatabaseRecord does not quote columns in the conflict clause when
> using PostgreSQLDatabaseAdapter. This prevents the usage of upsert (and
> possibly insert ignore) statements as soon as at least one column of the
> primary key contains at least one uppercase character.
> h2. Root Cause
> Postgres lowercases all unquoted keys, as described in the manual, which
> causes this bug:
> {quote}Quoting an identifier also makes it case-sensitive, whereas unquoted
> names are always folded to lower case. For example, the identifiers
> {{{}FOO{}}}, {{{}foo{}}}, and {{"foo"}} are considered the same by
> PostgreSQL, but {{"Foo"}} and {{"FOO"}} are different from these three and
> each other. (The folding of unquoted names to lower case in PostgreSQL is
> incompatible with the SQL standard, which says that unquoted names should be
> folded to upper case. Thus, {{foo}} should be equivalent to {{"FOO"}} not
> {{"foo"}} according to the standard. If you want to write portable
> applications you are advised to always quote a particular name or never quote
> it.)
> {quote}
> see
> [https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]
> h2. Reproduce behaviour
> Create a table in your *PostgreSQL* database
> {code:java}
> CREATE TABLE test_table
> (
> "TIME_PERIOD" date NOT NULL,
> "R0102" numeric,
> "R0203" numeric,
> "R0304" numeric,
> "R0405" numeric,
> "R0506" numeric,
> "R0607" numeric,
> "R0708" numeric,
> "R0809" numeric,
> "R0910" numeric,
> modified_at_utc date DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP(0)),
> CONSTRAINT test_table PRIMARY KEY ("TIME_PERIOD")
> ) {code}
> Use attached avro to write to the table using a PutDatabaseRecord Processor
> configured with UPSERT.
> Nifi will throw following Exception in the Bulletin Board
> {code:java}
> PutDatabaseRecord[id=...] Failed to put Records to database for
> FlowFile[filename=redacted.avro]. Routing to failure.:
> java.sql.BatchUpdateException: Batch entry 0 INSERT INTO
> "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405",
> "R0506", "R0607", "R0708", "R0809", "R0910") VALUES ('2022-10-21 +00',
> '2.97'::numeric, '3.2'::numeric, '3.19'::numeric, '3.19'::numeric,
> '3.22'::numeric, '3.33'::numeric, '3.33'::numeric, '3.4'::numeric,
> '3.38'::numeric) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD",
> "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809",
> "R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203",
> EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607",
> EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") was aborted: ERROR:
> column "time_period" does not exist
> Position: 347 Call getNextException to see other errors in the batch.
> - Caused by: org.postgresql.util.PSQLException: ERROR: column "time_period"
> does not exist
> Position: 347 {code}
> h3. Generated SQL Statement
> Nifi generates and issues following SQL Statement to the database:
> {code:java}
> INSERT INTO "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304",
> "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") VALUES (?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD",
> "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809",
> "R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203",
> EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607",
> EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") {code}
> This statement only works with quoted _TIME_PERIOD_ in the _ON CONFLICT_
> clause due to the behaviour of postgres described in Section _Root Cause_
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)