[
https://issues.apache.org/jira/browse/PHOENIX-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067361#comment-14067361
]
Jan Fernando commented on PHOENIX-1100:
---------------------------------------
[~jamestaylor] As promised here is an IT test we can use to repro. Add this to
UpsertSelectIT. If I remove SALT_BUCKETS= 64 from the DDL for
DUMMY_SEQ_TEST_DATA, the tests pass which corroborates my theory that
parallelization is causing this.
{code}
@Test
public void testUpsertSelectWithSequenceAndOrderByWithSalting() throws
Exception {
int numOfRecords = 2000;
long ts = nextTimestamp();
Properties props = new Properties();
props.setProperty(QueryServices.THREAD_POOL_SIZE_ATTRIB,
Integer.toString(64));
props.setProperty(QueryServices.QUEUE_SIZE_ATTRIB,
Integer.toString(500));
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB,
Long.toString(ts));
Connection conn = DriverManager.getConnection(getUrl(), props);
String ddl = "CREATE TABLE IF NOT EXISTS DUMMY_CURSOR_STORAGE ("
+ "ORGANIZATION_ID CHAR(15) NOT NULL,
QUERY_ID CHAR(15) NOT NULL, CURSOR_ORDER BIGINT NOT NULL, K1 INTEGER, V1
INTEGER "
+ "CONSTRAINT MAIN_PK PRIMARY KEY
(ORGANIZATION_ID, QUERY_ID, CURSOR_ORDER) "
+ ") SALT_BUCKETS = 64";
conn.createStatement().execute(ddl);
conn.createStatement().execute("CREATE TABLE DUMMY_SEQ_TEST_DATA " +
"(ORGANIZATION_ID CHAR(15) NOT
NULL, k1 integer NOT NULL, v1 integer NOT NULL " +
"CONSTRAINT PK PRIMARY KEY
(ORGANIZATION_ID, k1, v1) ) VERSIONS=1, SALT_BUCKETS = 64");
conn.createStatement().execute("create sequence s cache " +
Integer.MAX_VALUE);
conn.close();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts +
10));
conn = DriverManager.getConnection(getUrl(), props);
for (int i = 0; i < numOfRecords; i++) {
conn.createStatement().execute("UPSERT INTO DUMMY_SEQ_TEST_DATA
values ('00Dxx0000001gEH'," + i + "," + (i + 2) + ")");
}
conn.commit();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts +
15));
conn = DriverManager.getConnection(getUrl(), props);
conn.setAutoCommit(true);
conn.createStatement().execute("UPSERT INTO DUMMY_CURSOR_STORAGE SELECT
'00Dxx0000001gEH', 'MyQueryId', NEXT VALUE FOR S, k1, v1 FROM
DUMMY_SEQ_TEST_DATA ORDER BY K1, V1");
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts +
20));
conn = DriverManager.getConnection(getUrl(), props);
ResultSet rs = conn.createStatement().executeQuery("select count(*)
from DUMMY_CURSOR_STORAGE");
assertTrue(rs.next());
assertEquals(numOfRecords, rs.getLong(1));
conn.close();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts +
25));
ResultSet rs2 = conn.createStatement().executeQuery("select
cursor_order, k1, v1 from DUMMY_CURSOR_STORAGE order by cursor_order");
long seq = 1;
while (rs2.next()) {
assertEquals(seq, rs2.getLong("cursor_order"));
// This value should be the sequence - 1 as we said order by k1 in
the UPSERT...SELECT, but is not because of sequence processing.
assertEquals(seq - 1, rs2.getLong("k1"));
seq++;
}
conn.close();
}
{code}
> Upsert Select with Sequence and Order By doesn't result in sequence getting
> assigned based on requested ordering
> -----------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-1100
> URL: https://issues.apache.org/jira/browse/PHOENIX-1100
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 3.0.0, 4.0.0, 5.0.0
> Reporter: Jan Fernando
>
> I working with Query More implementation I noticed that once we had
> sufficient parallelization in the SELECT portion of UPSERT...SELECT due to
> Salting or region splits the results were not returned in the correct order.
> The root cause appeared to be due to the fact that we are using sequences to
> build a cursor of data for our Query More implementation. What appears to be
> happening is that as we get the next sequence value from the SequenceManager
> during result processing parallel threads all increment the sequence and
> therefore the sequence values reflect the processing order of each iterator
> versus the actual order specified by the order by. The expectation is that
> the sequence value should reflect the requested order specified by the ORDER
> BY versus the processing order.
--
This message was sent by Atlassian JIRA
(v6.2#6252)