[ 
https://issues.apache.org/jira/browse/PHOENIX-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766662#comment-16766662
 ] 

Hieu Nguyen commented on PHOENIX-5136:
--------------------------------------

Looking at the underlying HBase data, we noticed that UPSERT .. ON DUPLICATE 
KEY UPDATE will actually set an empty value for a column that should contain a 
Phoenix NULL value:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1) 
ON DUPLICATE KEY UPDATE "colB" = 2;
1 row affected (0.018 seconds)


hbase(main):016:0> scan 'test'
ROW                                         COLUMN+CELL
 file_a                                     column=0:\x00\x00\x00\x00, 
timestamp=1550021698069, value=x
 file_a                                     column=0:\x80\x0B, 
timestamp=1550021698069, value=
 file_a                                     column=0:\x80\x0C, 
timestamp=1550021698069, value=\x80\x00\x00\x00\x00\x00\x00\x01
{noformat}

Regular UPSERT will not set any value for that column:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1);
1 row affected (0.017 seconds)

ROW                                         COLUMN+CELL
 file_a                                     column=0:\x00\x00\x00\x00, 
timestamp=1550021776831, value=x
 file_a                                     column=0:\x80\x0C, 
timestamp=1550021776831, value=\x80\x00\x00\x00\x00\x00\x00\x01
1 row(s) in 0.0110 seconds
{noformat}

Note that this was done through a view, but we can re-run the experiment 
directly against a table if it helps.

> Rows with null values inserted by UPSERT .. ON DUPLICATE KEY UPDATE are 
> included in query results when they shouldn't be
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5136
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5136
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Hieu Nguyen
>            Priority: Major
>
> Rows with null values inserted using UPSERT .. ON DUPLICATE KEY UPDATE will 
> be selected in queries when they should not be.
> Here is a failing test that demonstrates the issue:
> {noformat}
> @Test
> public void 
> testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
>  throws Exception {
>     Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>     Connection conn = DriverManager.getConnection(getUrl(), props);
>     String tableName = generateUniqueName();
>     String ddl = " create table " + tableName + "(pk varchar primary key, 
> counter1 bigint, counter2 smallint)";
>     conn.createStatement().execute(ddl);
>     createIndex(conn, tableName);
>     // The data has to be specifically starting with null for the first 
> counter to fail the test. If you reverse the values, the test passes.
>     String dml1 = "UPSERT INTO " + tableName + " VALUES('a',NULL,2) ON 
> DUPLICATE KEY UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter1 
> END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 
> END";
>     conn.createStatement().execute(dml1);
>     conn.commit();
>     String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE 
> KEY UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 
> END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 
> END";
>     conn.createStatement().execute(dml2);
>     conn.commit();
>     // Using this statement causes the test to pass
>     //ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
> tableName + " WHERE counter2 = 2 AND counter1 = 1");
>     // This statement should be equivalent to the one above, but it selects 
> both rows.
>     ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
> tableName + " WHERE counter2 = 2 AND (counter1 = 1 OR counter1 = 1)");
>     assertTrue(rs.next());
>     assertEquals("b",rs.getString(1));
>     assertEquals(1,rs.getLong(2));
>     assertEquals(2,rs.getLong(3));
>     assertFalse(rs.next());
>     conn.close();
> }{noformat}
> The conditions are fairly specific:
>  * Must use ON DUPLICATE KEY UPDATE.  Inserting rows using UPSERT by itself 
> will have correct results
>  * The "counter2 = 2 AND (counter1 = 1 OR counter1 = 1)" condition caused the 
> test to fail, as opposed to the equivalent but simpler "counter2 = 2 AND 
> counter1 = 1".  I tested a similar "counter2 = 2 AND (counter1 = 1 OR 
> counter1 < 1)", which also caused the test to fail.
>  * If the NULL value for row 'a' is instead in the last position (counter2), 
> then row 'a' is not selected in the query as expected.  The below test 
> demonstrates this behavior (it passes as expected):
> {noformat}
> @Test
> public void 
> testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
>  throws Exception {
>     Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>     Connection conn = DriverManager.getConnection(getUrl(), props);
>     String tableName = generateUniqueName();
>     String ddl = " create table " + tableName + "(pk varchar primary key, 
> counter1 bigint, counter2 smallint)";
>     conn.createStatement().execute(ddl);
>     createIndex(conn, tableName);
>     String dml1 = "UPSERT INTO " + tableName + " VALUES('a',1,NULL) ON 
> DUPLICATE KEY UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 
> END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter2 
> END";
>     conn.createStatement().execute(dml1);
>     conn.commit();
>     String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE 
> KEY UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 
> END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 
> END";
>     conn.createStatement().execute(dml2);
>     conn.commit();
>     ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
> tableName + " WHERE counter1 = 1 AND (counter2 = 2 OR counter2 = 2)");
>     assertTrue(rs.next());
>     assertEquals("b",rs.getString(1));
>     assertEquals(1,rs.getLong(2));
>     assertEquals(2,rs.getLong(3));
>     assertFalse(rs.next());
>     conn.close();
> }
> {noformat}
> We also noticed this behavior when upserting and selecting manually against a 
> View.
> Any ideas on where to look to fix this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to