[
https://issues.apache.org/jira/browse/KUDU-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274136#comment-15274136
]
Jean-Daniel Cryans commented on KUDU-1440:
------------------------------------------
Hi Martin,
Results are not guaranteed to be returned in order, they are instead returned
per DiskRowSets, which are internally sorted, and if you stop inserting the
RowSet compactions will eventually sort the DiskRowSets between themselves.
See this related patch that's up for review to add a way to get rows in order
(per tablet) in the Java client: http://gerrit.cloudera.org:8080/#/c/2951/
But the main problem here is that, with hash partitioning, we still scan one
tablet at a time so if you were to run a full table scan you'd still not get
rows in total order.
> Wrong result ordering for scanning a table with millions of rows
> ----------------------------------------------------------------
>
> Key: KUDU-1440
> URL: https://issues.apache.org/jira/browse/KUDU-1440
> Project: Kudu
> Issue Type: Bug
> Components: client, master, tablet
> Affects Versions: 0.8.0
> Environment: CentOS 7
> Reporter: Martin Weindel
> Priority: Critical
> Attachments: CreateTableTimeSeriesBug.java
>
>
> I have following simple table with two columns:
> {code}
> time TIMESTAMP,
> value FLOAT
> {code}
> The time column is used as range partition key.
> If I have understand the architecture of Kudu correctly, the rows should then
> be returned in ascending order for the time column.
> This works as long as not more than about 600000 rows are inserted.
> If the number of inserted rows is above 1 mio, the order is messed up
> globally. On a microlevel it is still correct 99.9% if you look on successive
> rows.
> My setup is single master / single tablet server on a linux server. The table
> is created, filled and read with the Kudu Java client version 0.8.0.
> See attached Java code to reproduce the problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)