[jira] [Updated] (PHOENIX-3689) Not determinist order by with limit

Arthur (JIRA) Tue, 28 Feb 2017 07:47:12 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arthur updated PHOENIX-3689:
----------------------------
    Description: 
The following request does not return the last value of table TT:
select * from TT order by dt desc limit 1;
Adding a 'group by dt' clause gets back the good result.

I noticed that an order by with 'limit 10' returns a merge of 10 results from 
each region and not 10 results of the whole request.

So 'order by' is not determinist. It is a bug or a feature ?

Here is my DDL:
{noformat}
CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id varchar(20) 
NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, id));
{noformat}

The issue occurs with a lot of data. I think the 'order by' clause is done by 
region and not for the whole result. This script generate enough data to throw 
the issue:
{code:python}
#!/usr/bin/python

import string
from datetime import datetime, timedelta

dt = datetime(2017, 1, 1, 3)
with open('data.csv', 'w') as file:
        for i in range(0, 10000000):
                newdt = dt + timedelta(microseconds=i*10000)
                file.write("{};{};{};\n".format(datetime.strftime(newdt, 
"%Y-%m-%d %H:%M:%S.%f"), 91 if i  % 10  == 0 else 100, str(i).zfill(20)))
{code}

  was:
The following request does not return the last value of myTable:
select * from myTable order by myKey desc limit 1;
Adding a 'group by myKey' clause gets back the good result.

I noticed that an order by with 'limit 10' returns a merge of 10 results from 
each region and not 10 results of the whole request.

So 'order by' is not determinist. It is a bug or a feature ?

Here is my DDL:
CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id varchar(20) 
NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, id));

And some data with a dynamic column (I have 2 millions of similar rows sorted 
by time) :
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:00.3730',91,'00000000000000000000','POUR','S_052303');
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:00.7170',91,'00000000000000000001','PO','S_052303');
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:01.9030',91,'00000000000000000002','POUR','S_052303');
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:02.7330',91,'00000000000000000003','POUR','S_052303');
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:03.5470',91,'00000000000000000004','POUR','S_052303');
UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
03:31:04.7330',91,'00000000000000000005','POUR','S_052305');



> Not determinist order by with limit
> -----------------------------------
>
>                 Key: PHOENIX-3689
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3689
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>            Reporter: Arthur
>
> The following request does not return the last value of table TT:
> select * from TT order by dt desc limit 1;
> Adding a 'group by dt' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from 
> each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?
> Here is my DDL:
> {noformat}
> CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id 
> varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, 
> id));
> {noformat}
> The issue occurs with a lot of data. I think the 'order by' clause is done by 
> region and not for the whole result. This script generate enough data to 
> throw the issue:
> {code:python}
> #!/usr/bin/python
> import string
> from datetime import datetime, timedelta
> dt = datetime(2017, 1, 1, 3)
> with open('data.csv', 'w') as file:
>         for i in range(0, 10000000):
>                 newdt = dt + timedelta(microseconds=i*10000)
>                 file.write("{};{};{};\n".format(datetime.strftime(newdt, 
> "%Y-%m-%d %H:%M:%S.%f"), 91 if i  % 10  == 0 else 100, str(i).zfill(20)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3689) Not determinist order by with limit

Reply via email to