[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anoop Sam John updated PHOENIX-1203: ------------------------------------ Affects Version/s: (was: 4.1) (was: 4.0.0) 3.0.0 > Uable to work for count (distinct col) queries via phoenix table with > secondary indexes > --------------------------------------------------------------------------------------- > > Key: PHOENIX-1203 > URL: https://issues.apache.org/jira/browse/PHOENIX-1203 > Project: Phoenix > Issue Type: Bug > Affects Versions: 3.0.0 > Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 > Reporter: Sun Fulin > Assignee: Anoop Sam John > Labels: distinct, secondaryIndex, test > Attachments: PHOENIX-1203.patch > > > I build the latest 4.1 rc0 from here: > https://github.com/apache/phoenix/releases > And examine the count (distinct col) query within the new environment. > However, the problem still exists with index queries as the following while > the correct distinct query result is expected to be 536 for my project: > 0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t > where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 > and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; > +-------------+ > | COUNT(IMSI) | > +-------------+ > | 2322 | > +-------------+ > 1 row selected (70.572 seconds) > As James suggests, I conduct the query adding group by t.imsi with /without > secondary indexes. And the result seems to be fine as they both got the > correct 536 distinct groups. > Here are some considerations: > 1. count (distinct col) query over index table did not work as expectation. > 2. only distinct query over index table works fine. > 3. If the phoenix version got some wrong configuration, correct me. > Thanks and Best Regards, > Sun > ------------------------------- > Hi Sun, > Thanks for the detailed description. Yes, your syntax is correct, and > it's definitely true that the count distinct query should return the > same result with and without the index. Would you mind trying this on > our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to > file a JIRA? > > One thing that make make it easier for your testing: do you know about > our NO_INDEX hint which forces the query *not* to use an index, like > this: > > select /*+ NO_INDEX */ ... > > Another question too. What about this query with and with/out the index: > > select count(*) from ranapsignal t > where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 > and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 > and t.ranapsubmsgtype=0 > group by t.imsi; > > Thanks, > James > > On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn > <su...@certusnet.com.cn> wrote: > > > > > > Hi James, > > > > Recently I got trouble while trying to conduct some query performance test > > in my phoenix tables with secondary indexes. > > I created a table called RANAPSIGNAL for my projects in phoenix via > > sqlline client and load data into the table. Then I create > > an index on the specific column PKTTIME for the table RANAPSIGNAL while > > including other more columns for adjusting my > > index query, like the following DDL: > > create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include > > (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); > > The index creation worked successfully without any errors. So, when I am > > trying to conduct such query as: > > select count (distinct t.imsi) from ranapsignal t where > > t.pkttime>=1404964800000 and t.pkttime<=1404965699999 > > and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and > > t.ranapsubmsgtype=0 ; > > Without secondary indexes, the final result got 536 distinct imsi, wihch > > is the right distinct count results. However, after I create the above > > secondary index PKT_IDX and reconducting the above count (distinct imsi) > > query, I got 2322 imsi rows which obviously are not the expected > > distinct counts results. I used the explain grammar to observe the scan of > > the above select query and found that it definitely scaned over > > the index table PKT_IDX. I then tried to conduct the following query with > > no count function: > > select distinct t.imsi from ranapsignal t where > > t.pkttime>=1404964800000 and t.pkttime<=1404965699999 > > and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and > > t.ranapsubmsgtype=0 ; > > And the result is right 536 distinct imsi over scanning the index table. > > By the way, imsi is one of the primary key when creating the table > > RANAPSIGNAL. > > Here are several considerations for my trouble and practice: > > 1. Did you guys ever practice such count (distinct) queries over phoenix > > table via secondary index? > > 2. I am not sure whether this problem was due to the index table, but my > > practice may assume that conclusion. > > 3. Corrects my if I am wrong with my previous operations (index creation, > > DDL grammar, etc..) > > 4. Any available hints or reply are best appreciated. > > > > Thanks and Best Regards > > Sun > -- This message was sent by Atlassian JIRA (v6.3.4#6332)