I haven't apply filters on the high cardinality, account no. is a high
cardinality about 240w.
what can I do when I need to retain the high cardinality (account no).

Is that filters applied during creating model ?

2016-09-18 9:54 GMT+08:00 hongbin ma <mahong...@apache.org>:

> Hi Mars
>
> the query is using cuboid 15 (0x00001111), which means your query involves
> the last four dimensions in the row key? From "Total scanned row:
> 100245548"
> we can see the cuboid is hardly pre-aggregated as a cuboid. Can you check
> the last four dimensions of the row key? Do they have very high
> cardinality?
>
> BTW, do you apply filters on the high cardinality? If so you should think
> about redesigning the row key: Usually high-card dimensions should precede
> low-card dimensions for filter effectiveness.
>
> On Sun, Sep 18, 2016 at 9:32 AM, Mars J <xujiao.myc...@gmail.com> wrote:
>
> > OK ,Kylin version is 1.5.2.1, fact table has 200,000,000 records, my cube
> > is very simple, a fact table about transaction, a dimension table of
> > account, and a dimension table of branch. account table's cardinality is
> > 240w records. they left join on a acct_no column.
> >
> >
> > 2016-09-16 23:56 GMT+08:00 Luke Han <luke...@gmail.com>:
> >
> > > Hi Mars,
> > >     You are trying to query data without group by, Kylin may not
> perform
> > > very well without tuning your cube.
> > >
> > >     And we can't help you with just "log as below...", please offer
> more
> > > detail information about your kylin's version, source data, metadata
> and
> > so
> > > on
> > >
> > >    Thanks.
> > > Luke
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Fri, Sep 9, 2016 at 5:40 PM, Mars J <xujiao.myc...@gmail.com>
> wrote:
> > >
> > > > hello all,
> > > >     My query sql 'SELECT
> > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> F.ACCT_NO=A.ACCT_NO
> > > > LIMIT 100' to query a cube (size :3.6G ,and fact table has
> > 200,000,000),
> > > > the query is failed.
> > > >
> > > >     kylin log is as follow :
> > > >
> > > > Using project: TRANS_NO_DATE
> > > > 2016-09-09 17:32:15,705 INFO  [http-bio-7070-exec-7]
> > > > controller.QueryController:175 : The original query:  SELECT
> > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> F.ACCT_NO=A.ACCT_NO
> > > > LIMIT 100
> > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:48
> > > > : The project manager's reference is
> > > > org.apache.kylin.metadata.project.ProjectManager@1aa81aff
> > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:60
> > > > : Find candidates by table NY.TRANS_FACT and project=TRANS_NO_DATE :
> > > > org.apache.kylin.query.routing.Candidate@62e0ac94
> > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:49
> > > > : Applying rule: class
> > > > org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRul
> e,
> > > > realizations before: [TND1(CUBE)], realizations after: [TND1(CUBE)]
> > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:49
> > > > : Applying rule: class
> > > > org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations
> > > > before: [TND1(CUBE)], realizations after: [TND1(CUBE)]
> > > > 2016-09-09 17:32:15,746 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:72
> > > > : The realizations remaining: [TND1(CUBE)] And the final chosen one
> is
> > > the
> > > > first one
> > > > 2016-09-09 17:32:15,756 DEBUG [http-bio-7070-exec-7]
> > > > enumerator.OLAPEnumerator:107 : query storage...
> > > > 2016-09-09 17:32:15,756 INFO  [http-bio-7070-exec-7]
> > > > enumerator.OLAPEnumerator:181 : No group by and aggregation found in
> > this
> > > > query, will hack some result for better look of output...
> > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeStorageQuery:239 : exactAggregation is true
> > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeStorageQuery:357 : Enable limit 100
> > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > dict.DictionaryManager:393 : DictionaryManager(1238461247) loading
> > > > DictionaryInfo(loadDictObj:true) at
> > > > /dict/NY.TRANS_FACT/SET_DATE/d8379c72-dfc6-44d1-b429-
> 9922cbd21091.dict
> > > > 2016-09-09 17:32:15,759 INFO  [http-bio-7070-exec-7]
> > > > dict.DictionaryManager:393 : DictionaryManager(1238461247) loading
> > > > DictionaryInfo(loadDictObj:true) at
> > > > /dict/NY.TRANS_FACT/DC_FLAG/e7cdf373-2379-4313-89da-
> 0d9b44954cd6.dict
> > > > 2016-09-09 17:32:15,761 DEBUG [http-bio-7070-exec-7]
> > > > v2.CubeHBaseEndpointRPC:257 : New scanner for current segment
> > > > TND1[19700101000000_20161001000000] will use
> SCAN_FILTER_AGGR_CHECKMEM
> > > as
> > > > endpoint's behavior
> > > > 2016-09-09 17:32:15,762 DEBUG [http-bio-7070-exec-7]
> > > > v2.CubeHBaseEndpointRPC:313 : Serialized scanRequestBytes 684 bytes,
> > > > rawScanBytesString 50 bytes
> > > > 2016-09-09 17:32:15,762 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeHBaseEndpointRPC:315 : The scan 38504673 for segment
> > > > TND1[19700101000000_20161001000000] is as below with 1 separate raw
> > > scans,
> > > > shard part of start/end key is set to 0
> > > > 2016-09-09 17:32:15,762 INFO  [http-bio-7070-exec-7]
> > v2.CubeHBaseRPC:271
> > > :
> > > > Visiting hbase table KYLIN_SL43718YJF: cuboid exact match, from 15 to
> > 15
> > > > Start: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\x00\x00\x00\x00\
> > x00\x00
> > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00)
> > Stop:
> > > >  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\xFF\xFF\xFF\xFF\
> xFF\xFF\x00
> > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\xFF\xFF\xFF\xFF\
> > xFF\xFF\x00),
> > > No
> > > > Fuzzy Key
> > > > 2016-09-09 17:32:15,762 DEBUG [http-bio-7070-exec-7]
> > > > v2.CubeHBaseEndpointRPC:320 : Submitting rpc to 2 shards starting
> from
> > > > shard 0, scan range count 1
> > > > 2016-09-09 17:32:15,763 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeHBaseEndpointRPC:103 : Timeout for ExpectedSizeIterator is:
> > > 9900000
> > > > 2016-09-09 17:32:15,763 DEBUG [http-bio-7070-exec-7]
> > > > enumerator.OLAPEnumerator:127 : return TupleIterator...
> > > > 2016-09-09 17:33:01,574 INFO  [pool-4-thread-1]
> > > > threadpool.DefaultScheduler:106 : Job Fetcher: 0 running, 0 actual
> > > > running,
> > > > 0 ready, 58 others
> > > > 2016-09-09 17:33:48,867 INFO  [BadQueryDetector]
> > > > service.BadQueryDetector:104 : Slow query has been running 93.161
> > seconds
> > > > (project:TRANS_NO_DATE, thread: 0xc1) -- SELECT
> > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> F.ACCT_NO=A.ACCT_NO
> > > > LIMIT 100
> > > > 2016-09-09 17:33:48,875 DEBUG [BadQueryDetector]
> > > > badquery.BadQueryHistoryManager:84 : Loaded 10 Bad Query(s)
> > > > 2016-09-09 17:33:48,916 DEBUG [BadQueryDetector]
> > > > hbase.HBaseResourceStore:262 : Update row
> /bad_query/TRANS_NO_DATE.json
> > > > from oldTs: 1473411958909, to newTs: 1473413628875, operation result:
> > > true
> > > > 2016-09-09 17:33:48,916 INFO  [BadQueryDetector]
> > > > service.BadQueryDetector:230 : Problematic thread 0xc1
> > > > at sun.misc.Unsafe.park(Native Method)
> > > > at java.util.concurrent.locks.LockSupport.parkNanos(
> > > LockSupport.java:215)
> > > > at
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> > > > ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> > > > at java.util.concurrent.ArrayBlockingQueue.poll(
> > > > ArrayBlockingQueue.java:418)
> > > > at
> > > > org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$
> > > > ExpectedSizeIterator.next(CubeHBaseEndpointRPC.java:125)
> > > > at
> > > > org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$
> > > > ExpectedSizeIterator.next(CubeHBaseEndpointRPC.java:81)
> > > > at
> > > > com.google.common.collect.TransformedIterator.next(
> > > > TransformedIterator.java:48)
> > > > at com.google.common.collect.Iterators$6.hasNext(Iterators.java:583)
> > > > at
> > > > org.apache.kylin.storage.hbase.cube.v2.SequentialCubeTupleIterator.
> > > > hasNext(SequentialCubeTupleIterator.java:96)
> > > > at
> > > > org.apache.kylin.query.enumerator.OLAPEnumerator.
> > > > moveNext(OLAPEnumerator.java:74)
> > > >
> > > > 2016-09-09 17:34:01,572 INFO  [pool-4-thread-1]
> > > > threadpool.DefaultScheduler:106 : Job Fetcher: 0 running, 0 actual
> > > > running,
> > > > 0 ready, 58 others
> > > > 2016-09-09 17:34:12,198 INFO  [pool-6-thread-1]
> > > v2.CubeHBaseEndpointRPC:351
> > > > : <sub-thread for GTScanRequest 38504673> Endpoint RPC returned from
> > > HTable
> > > > KYLIN_SL43718YJF Shard
> > > > \x4B\x59\x4C\x49\x4E\x5F\x53\x4C\x34\x33\x37\x31\x38\x59\
> > > > x4A\x46\x2C\x00\x01\x2C\x31\x34\x37\x33\x33\x38\x37\x30\
> > > > x39\x36\x34\x30\x37\x2E\x36\x39\x33\x61\x32\x39\x61\x33\
> > > > x62\x63\x63\x35\x66\x35\x66\x31\x32\x33\x64\x64\x30\x63\
> > > > x32\x38\x63\x39\x39\x34\x64\x38\x38\x31\x2E
> > > > on host: slave5.Total scanned row: 100245548. Total filtered/aggred
> > row:
> > > 0.
> > > > Time elapsed in EP: 107063(ms). Server CPU usage:
> 0.21609751440119665,
> > > > server physical mem left: 4.769349632E9, server swap mem
> > > > left:8.131039232E9.Etc message: start latency: 17@1,agg done@72715
> > > > ,compress
> > > > done@107063,server stats done@107063,
> > > > debugGitTag:cf4d2940b67d622eacd2ac9a913b221091a35c2e;.Normal
> Complete:
> > > > true.
> > > >
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>

Reply via email to