[
https://issues.apache.org/jira/browse/PHOENIX-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766662#comment-16766662
]
Hieu Nguyen commented on PHOENIX-5136:
--------------------------------------
Looking at the underlying HBase data, we noticed that UPSERT .. ON DUPLICATE
KEY UPDATE will actually set an empty value for a column that should contain a
Phoenix NULL value:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1)
ON DUPLICATE KEY UPDATE "colB" = 2;
1 row affected (0.018 seconds)
hbase(main):016:0> scan 'test'
ROW COLUMN+CELL
file_a column=0:\x00\x00\x00\x00,
timestamp=1550021698069, value=x
file_a column=0:\x80\x0B,
timestamp=1550021698069, value=
file_a column=0:\x80\x0C,
timestamp=1550021698069, value=\x80\x00\x00\x00\x00\x00\x00\x01
{noformat}
Regular UPSERT will not set any value for that column:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1);
1 row affected (0.017 seconds)
ROW COLUMN+CELL
file_a column=0:\x00\x00\x00\x00,
timestamp=1550021776831, value=x
file_a column=0:\x80\x0C,
timestamp=1550021776831, value=\x80\x00\x00\x00\x00\x00\x00\x01
1 row(s) in 0.0110 seconds
{noformat}
Note that this was done through a view, but we can re-run the experiment
directly against a table if it helps.
> Rows with null values inserted by UPSERT .. ON DUPLICATE KEY UPDATE are
> included in query results when they shouldn't be
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5136
> URL: https://issues.apache.org/jira/browse/PHOENIX-5136
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Hieu Nguyen
> Priority: Major
>
> Rows with null values inserted using UPSERT .. ON DUPLICATE KEY UPDATE will
> be selected in queries when they should not be.
> Here is a failing test that demonstrates the issue:
> {noformat}
> @Test
> public void
> testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
> throws Exception {
> Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
> Connection conn = DriverManager.getConnection(getUrl(), props);
> String tableName = generateUniqueName();
> String ddl = " create table " + tableName + "(pk varchar primary key,
> counter1 bigint, counter2 smallint)";
> conn.createStatement().execute(ddl);
> createIndex(conn, tableName);
> // The data has to be specifically starting with null for the first
> counter to fail the test. If you reverse the values, the test passes.
> String dml1 = "UPSERT INTO " + tableName + " VALUES('a',NULL,2) ON
> DUPLICATE KEY UPDATE " +
> "counter1 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter1
> END, " +
> "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2
> END";
> conn.createStatement().execute(dml1);
> conn.commit();
> String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE
> KEY UPDATE " +
> "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1
> END, " +
> "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2
> END";
> conn.createStatement().execute(dml2);
> conn.commit();
> // Using this statement causes the test to pass
> //ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " +
> tableName + " WHERE counter2 = 2 AND counter1 = 1");
> // This statement should be equivalent to the one above, but it selects
> both rows.
> ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " +
> tableName + " WHERE counter2 = 2 AND (counter1 = 1 OR counter1 = 1)");
> assertTrue(rs.next());
> assertEquals("b",rs.getString(1));
> assertEquals(1,rs.getLong(2));
> assertEquals(2,rs.getLong(3));
> assertFalse(rs.next());
> conn.close();
> }{noformat}
> The conditions are fairly specific:
> * Must use ON DUPLICATE KEY UPDATE. Inserting rows using UPSERT by itself
> will have correct results
> * The "counter2 = 2 AND (counter1 = 1 OR counter1 = 1)" condition caused the
> test to fail, as opposed to the equivalent but simpler "counter2 = 2 AND
> counter1 = 1". I tested a similar "counter2 = 2 AND (counter1 = 1 OR
> counter1 < 1)", which also caused the test to fail.
> * If the NULL value for row 'a' is instead in the last position (counter2),
> then row 'a' is not selected in the query as expected. The below test
> demonstrates this behavior (it passes as expected):
> {noformat}
> @Test
> public void
> testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
> throws Exception {
> Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
> Connection conn = DriverManager.getConnection(getUrl(), props);
> String tableName = generateUniqueName();
> String ddl = " create table " + tableName + "(pk varchar primary key,
> counter1 bigint, counter2 smallint)";
> conn.createStatement().execute(ddl);
> createIndex(conn, tableName);
> String dml1 = "UPSERT INTO " + tableName + " VALUES('a',1,NULL) ON
> DUPLICATE KEY UPDATE " +
> "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1
> END, " +
> "counter2 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter2
> END";
> conn.createStatement().execute(dml1);
> conn.commit();
> String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE
> KEY UPDATE " +
> "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1
> END, " +
> "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2
> END";
> conn.createStatement().execute(dml2);
> conn.commit();
> ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " +
> tableName + " WHERE counter1 = 1 AND (counter2 = 2 OR counter2 = 2)");
> assertTrue(rs.next());
> assertEquals("b",rs.getString(1));
> assertEquals(1,rs.getLong(2));
> assertEquals(2,rs.getLong(3));
> assertFalse(rs.next());
> conn.close();
> }
> {noformat}
> We also noticed this behavior when upserting and selecting manually against a
> View.
> Any ideas on where to look to fix this issue?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)