[
https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888282#comment-15888282
]
Arthur commented on PHOENIX-3689:
---------------------------------
Hum, I have no problem with this DDL (it uses dynamic column). I changed the
description with a generic example (and a script to produce enough data)
> Not determinist order by with limit
> -----------------------------------
>
> Key: PHOENIX-3689
> URL: https://issues.apache.org/jira/browse/PHOENIX-3689
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.7.0
> Reporter: Arthur
>
> The following request does not return the last value of table TT:
> select * from TT order by dt desc limit 1;
> Adding a 'group by dt' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from
> each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?
> Here is my DDL:
> {noformat}
> CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id
> varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message,
> id));
> {noformat}
> The issue occurs with a lot of data. I think the 'order by' clause is done by
> region and not for the whole result, so limit 1 returns the first region that
> answers and phoenix cache it. With only one region, this does not occur.
> This script generate enough data to throw the issue:
> {code}
> #!/usr/bin/python
> import string
> from datetime import datetime, timedelta
> dt = datetime(2017, 1, 1, 3)
> with open('data.csv', 'w') as file:
> for i in range(0, 10000000):
> newdt = dt + timedelta(microseconds=i*10000)
> file.write("{};{};{};\n".format(datetime.strftime(newdt,
> "%Y-%m-%d %H:%M:%S.%f"), 91 if i % 10 == 0 else 100, str(i).zfill(20)))
> {code}
> With this data set, the last data is : 2017-01-02 06:46:39.990000
> Result with order by clause is not the last value:
> {noformat}
> select dt from TT order by dt desc limit 1;
> +--------------------------+
> | DT |
> +--------------------------+
> | 2017-01-01 07:54:40.730 |
> {noformat}
> Correct result is given when using group by, but I need to get all columns.
> {noformat}
> select dt from TT group by dt order by dt desc limit 1;
> +--------------------------+
> | DT |
> +--------------------------+
> | 2017-01-02 06:46:39.990 |
> +--------------------------+
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)