[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

Anoop Sam John (JIRA) Sun, 31 Aug 2014 12:07:03 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anoop Sam John updated PHOENIX-1203:
------------------------------------
    Affects Version/s:     (was: 4.1)
                           (was: 4.0.0)
                       3.0.0

> Uable to work for count (distinct col) queries via phoenix table with 
> secondary indexes
> ---------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1203
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
>            Reporter: Sun Fulin
>            Assignee: Anoop Sam John
>              Labels: distinct, secondaryIndex, test
>         Attachments: PHOENIX-1203.patch
>
>
> I build the latest 4.1 rc0 from here:  
> https://github.com/apache/phoenix/releases
> And examine the count (distinct col) query within the new environment. 
> However, the problem still exists with index queries as the following while 
> the correct distinct query result is expected to be 536 for my project:
> 0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t 
> where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 
> and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
> +-------------+
> | COUNT(IMSI) |
> +-------------+
> | 2322        |
> +-------------+
> 1 row selected (70.572 seconds)
> As James suggests, I conduct the query adding group by t.imsi with /without 
> secondary indexes. And the result seems to be fine as they both got the 
> correct 536 distinct groups. 
> Here are some considerations:
> 1. count (distinct col) query over index table did not work as expectation.
> 2. only distinct query over index table works fine.
> 3. If the phoenix version got some wrong configuration, correct me.
> Thanks and Best Regards,
> Sun
> -------------------------------
> Hi Sun,
> Thanks for the detailed description. Yes, your syntax is correct, and
> it's definitely true that the count distinct query should return the
> same result with and without the index. Would you mind trying this on
> our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
> file a JIRA?
>  
> One thing that make make it easier for your testing: do you know about
> our NO_INDEX hint which forces the query *not* to use an index, like
> this:
>  
> select /*+ NO_INDEX */ ...
>  
> Another question too. What about this query with and with/out the index:
>  
> select count(*) from ranapsignal t
> where t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
> and t.ranapsubmsgtype=0
> group by t.imsi;
>  
> Thanks,
> James
>  
> On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
> <su...@certusnet.com.cn> wrote:
> >
> >
> > Hi James,
> >
> >   Recently I got trouble while trying to conduct some query performance test
> > in my phoenix tables with secondary indexes.
> >   I created a table called RANAPSIGNAL for my projects in phoenix via
> > sqlline client and load data into the table. Then I create
> >   an index on the specific column PKTTIME for the table RANAPSIGNAL while
> > including other more columns for adjusting my
> >   index query, like the following DDL:
> >          create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
> > (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
> >    The index creation worked successfully without any errors. So, when I am
> > trying to conduct such query as:
> >            select count (distinct t.imsi) from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    Without secondary indexes, the final result got 536 distinct imsi, wihch
> > is the right distinct count results. However, after I  create the above
> >   secondary index PKT_IDX and reconducting the above count (distinct imsi)
> > query, I got 2322 imsi rows which obviously are not the expected
> >   distinct counts results. I used the explain grammar to observe the scan of
> > the above select query and found that it definitely scaned over
> >   the index table PKT_IDX. I then tried to conduct the following query with
> > no count function:
> >             select distinct t.imsi from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    And the result is right 536 distinct imsi over scanning the index table.
> > By the way, imsi is one of the primary key when creating the table
> > RANAPSIGNAL.
> >    Here are several considerations for my trouble and practice:
> >    1. Did you guys ever practice such count (distinct) queries over phoenix
> > table via secondary index?
> >    2. I am not sure whether this problem was due to the index table, but my
> > practice may assume that conclusion.
> >    3. Corrects my if I am wrong with my previous operations (index creation,
> > DDL grammar, etc..)
> >    4. Any available hints or reply are best appreciated.
> >
> >   Thanks and Best Regards
> >   Sun
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

Reply via email to