query without filter on such high cardinality is not good idea for Kylin
usage.

this query will pull all data back which looks like something about ETL or
others but not OLAP relative query.

Kylin's design purpose is for aggregation query with low latency which mean
user should narrow down query result with enough filter conditions.

Please add more filters and try again.

Thanks.


Best Regards!
---------------------

Luke Han

On Mon, Sep 19, 2016 at 8:48 AM, Mars J <xujiao.myc...@gmail.com> wrote:

> I haven't apply filters on the high cardinality, account no. is a high
> cardinality about 240w.
> what can I do when I need to retain the high cardinality (account no).
>
> Is that filters applied during creating model ?
>
> 2016-09-18 9:54 GMT+08:00 hongbin ma <mahong...@apache.org>:
>
> > Hi Mars
> >
> > the query is using cuboid 15 (0x00001111), which means your query
> involves
> > the last four dimensions in the row key? From "Total scanned row:
> > 100245548"
> > we can see the cuboid is hardly pre-aggregated as a cuboid. Can you check
> > the last four dimensions of the row key? Do they have very high
> > cardinality?
> >
> > BTW, do you apply filters on the high cardinality? If so you should think
> > about redesigning the row key: Usually high-card dimensions should
> precede
> > low-card dimensions for filter effectiveness.
> >
> > On Sun, Sep 18, 2016 at 9:32 AM, Mars J <xujiao.myc...@gmail.com> wrote:
> >
> > > OK ,Kylin version is 1.5.2.1, fact table has 200,000,000 records, my
> cube
> > > is very simple, a fact table about transaction, a dimension table of
> > > account, and a dimension table of branch. account table's cardinality
> is
> > > 240w records. they left join on a acct_no column.
> > >
> > >
> > > 2016-09-16 23:56 GMT+08:00 Luke Han <luke...@gmail.com>:
> > >
> > > > Hi Mars,
> > > >     You are trying to query data without group by, Kylin may not
> > perform
> > > > very well without tuning your cube.
> > > >
> > > >     And we can't help you with just "log as below...", please offer
> > more
> > > > detail information about your kylin's version, source data, metadata
> > and
> > > so
> > > > on
> > > >
> > > >    Thanks.
> > > > Luke
> > > >
> > > >
> > > > Best Regards!
> > > > ---------------------
> > > >
> > > > Luke Han
> > > >
> > > > On Fri, Sep 9, 2016 at 5:40 PM, Mars J <xujiao.myc...@gmail.com>
> > wrote:
> > > >
> > > > > hello all,
> > > > >     My query sql 'SELECT
> > > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> > F.ACCT_NO=A.ACCT_NO
> > > > > LIMIT 100' to query a cube (size :3.6G ,and fact table has
> > > 200,000,000),
> > > > > the query is failed.
> > > > >
> > > > >     kylin log is as follow :
> > > > >
> > > > > Using project: TRANS_NO_DATE
> > > > > 2016-09-09 17:32:15,705 INFO  [http-bio-7070-exec-7]
> > > > > controller.QueryController:175 : The original query:  SELECT
> > > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> > F.ACCT_NO=A.ACCT_NO
> > > > > LIMIT 100
> > > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > > routing.QueryRouter:48
> > > > > : The project manager's reference is
> > > > > org.apache.kylin.metadata.project.ProjectManager@1aa81aff
> > > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > > routing.QueryRouter:60
> > > > > : Find candidates by table NY.TRANS_FACT and project=TRANS_NO_DATE
> :
> > > > > org.apache.kylin.query.routing.Candidate@62e0ac94
> > > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > > routing.QueryRouter:49
> > > > > : Applying rule: class
> > > > > org.apache.kylin.query.routing.rules.
> RemoveUncapableRealizationsRul
> > e,
> > > > > realizations before: [TND1(CUBE)], realizations after: [TND1(CUBE)]
> > > > > 2016-09-09 17:32:15,745 INFO  [http-bio-7070-exec-7]
> > > > routing.QueryRouter:49
> > > > > : Applying rule: class
> > > > > org.apache.kylin.query.routing.rules.RealizationSortRule,
> > realizations
> > > > > before: [TND1(CUBE)], realizations after: [TND1(CUBE)]
> > > > > 2016-09-09 17:32:15,746 INFO  [http-bio-7070-exec-7]
> > > > routing.QueryRouter:72
> > > > > : The realizations remaining: [TND1(CUBE)] And the final chosen one
> > is
> > > > the
> > > > > first one
> > > > > 2016-09-09 17:32:15,756 DEBUG [http-bio-7070-exec-7]
> > > > > enumerator.OLAPEnumerator:107 : query storage...
> > > > > 2016-09-09 17:32:15,756 INFO  [http-bio-7070-exec-7]
> > > > > enumerator.OLAPEnumerator:181 : No group by and aggregation found
> in
> > > this
> > > > > query, will hack some result for better look of output...
> > > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > > v2.CubeStorageQuery:239 : exactAggregation is true
> > > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > > v2.CubeStorageQuery:357 : Enable limit 100
> > > > > 2016-09-09 17:32:15,757 INFO  [http-bio-7070-exec-7]
> > > > > dict.DictionaryManager:393 : DictionaryManager(1238461247) loading
> > > > > DictionaryInfo(loadDictObj:true) at
> > > > > /dict/NY.TRANS_FACT/SET_DATE/d8379c72-dfc6-44d1-b429-
> > 9922cbd21091.dict
> > > > > 2016-09-09 17:32:15,759 INFO  [http-bio-7070-exec-7]
> > > > > dict.DictionaryManager:393 : DictionaryManager(1238461247) loading
> > > > > DictionaryInfo(loadDictObj:true) at
> > > > > /dict/NY.TRANS_FACT/DC_FLAG/e7cdf373-2379-4313-89da-
> > 0d9b44954cd6.dict
> > > > > 2016-09-09 17:32:15,761 DEBUG [http-bio-7070-exec-7]
> > > > > v2.CubeHBaseEndpointRPC:257 : New scanner for current segment
> > > > > TND1[19700101000000_20161001000000] will use
> > SCAN_FILTER_AGGR_CHECKMEM
> > > > as
> > > > > endpoint's behavior
> > > > > 2016-09-09 17:32:15,762 DEBUG [http-bio-7070-exec-7]
> > > > > v2.CubeHBaseEndpointRPC:313 : Serialized scanRequestBytes 684
> bytes,
> > > > > rawScanBytesString 50 bytes
> > > > > 2016-09-09 17:32:15,762 INFO  [http-bio-7070-exec-7]
> > > > > v2.CubeHBaseEndpointRPC:315 : The scan 38504673 for segment
> > > > > TND1[19700101000000_20161001000000] is as below with 1 separate
> raw
> > > > scans,
> > > > > shard part of start/end key is set to 0
> > > > > 2016-09-09 17:32:15,762 INFO  [http-bio-7070-exec-7]
> > > v2.CubeHBaseRPC:271
> > > > :
> > > > > Visiting hbase table KYLIN_SL43718YJF: cuboid exact match, from 15
> to
> > > 15
> > > > > Start: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\x00\x00\x00\x00\
> > > x00\x00
> > > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00)
> > > Stop:
> > > > >  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\xFF\xFF\xFF\xFF\
> > xFF\xFF\x00
> > > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0F\xFF\xFF\xFF\xFF\
> > > xFF\xFF\x00),
> > > > No
> > > > > Fuzzy Key
> > > > > 2016-09-09 17:32:15,762 DEBUG [http-bio-7070-exec-7]
> > > > > v2.CubeHBaseEndpointRPC:320 : Submitting rpc to 2 shards starting
> > from
> > > > > shard 0, scan range count 1
> > > > > 2016-09-09 17:32:15,763 INFO  [http-bio-7070-exec-7]
> > > > > v2.CubeHBaseEndpointRPC:103 : Timeout for ExpectedSizeIterator is:
> > > > 9900000
> > > > > 2016-09-09 17:32:15,763 DEBUG [http-bio-7070-exec-7]
> > > > > enumerator.OLAPEnumerator:127 : return TupleIterator...
> > > > > 2016-09-09 17:33:01,574 INFO  [pool-4-thread-1]
> > > > > threadpool.DefaultScheduler:106 : Job Fetcher: 0 running, 0 actual
> > > > > running,
> > > > > 0 ready, 58 others
> > > > > 2016-09-09 17:33:48,867 INFO  [BadQueryDetector]
> > > > > service.BadQueryDetector:104 : Slow query has been running 93.161
> > > seconds
> > > > > (project:TRANS_NO_DATE, thread: 0xc1) -- SELECT
> > > > > A.ACCT_NO,F.BRAN_CODE,F.SET_DATE,F.ACCT_NO,F.DC_FLAG,F.TRANS_AMT
> > > > > FROM NY.TRANS_FACT F LEFT JOIN NY.ACCOUNT_DIM A ON
> > F.ACCT_NO=A.ACCT_NO
> > > > > LIMIT 100
> > > > > 2016-09-09 17:33:48,875 DEBUG [BadQueryDetector]
> > > > > badquery.BadQueryHistoryManager:84 : Loaded 10 Bad Query(s)
> > > > > 2016-09-09 17:33:48,916 DEBUG [BadQueryDetector]
> > > > > hbase.HBaseResourceStore:262 : Update row
> > /bad_query/TRANS_NO_DATE.json
> > > > > from oldTs: 1473411958909, to newTs: 1473413628875, operation
> result:
> > > > true
> > > > > 2016-09-09 17:33:48,916 INFO  [BadQueryDetector]
> > > > > service.BadQueryDetector:230 : Problematic thread 0xc1
> > > > > at sun.misc.Unsafe.park(Native Method)
> > > > > at java.util.concurrent.locks.LockSupport.parkNanos(
> > > > LockSupport.java:215)
> > > > > at
> > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> > > > > ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> > > > > at java.util.concurrent.ArrayBlockingQueue.poll(
> > > > > ArrayBlockingQueue.java:418)
> > > > > at
> > > > > org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$
> > > > > ExpectedSizeIterator.next(CubeHBaseEndpointRPC.java:125)
> > > > > at
> > > > > org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$
> > > > > ExpectedSizeIterator.next(CubeHBaseEndpointRPC.java:81)
> > > > > at
> > > > > com.google.common.collect.TransformedIterator.next(
> > > > > TransformedIterator.java:48)
> > > > > at com.google.common.collect.Iterators$6.hasNext(Iterators.
> java:583)
> > > > > at
> > > > > org.apache.kylin.storage.hbase.cube.v2.
> SequentialCubeTupleIterator.
> > > > > hasNext(SequentialCubeTupleIterator.java:96)
> > > > > at
> > > > > org.apache.kylin.query.enumerator.OLAPEnumerator.
> > > > > moveNext(OLAPEnumerator.java:74)
> > > > >
> > > > > 2016-09-09 17:34:01,572 INFO  [pool-4-thread-1]
> > > > > threadpool.DefaultScheduler:106 : Job Fetcher: 0 running, 0 actual
> > > > > running,
> > > > > 0 ready, 58 others
> > > > > 2016-09-09 17:34:12,198 INFO  [pool-6-thread-1]
> > > > v2.CubeHBaseEndpointRPC:351
> > > > > : <sub-thread for GTScanRequest 38504673> Endpoint RPC returned
> from
> > > > HTable
> > > > > KYLIN_SL43718YJF Shard
> > > > > \x4B\x59\x4C\x49\x4E\x5F\x53\x4C\x34\x33\x37\x31\x38\x59\
> > > > > x4A\x46\x2C\x00\x01\x2C\x31\x34\x37\x33\x33\x38\x37\x30\
> > > > > x39\x36\x34\x30\x37\x2E\x36\x39\x33\x61\x32\x39\x61\x33\
> > > > > x62\x63\x63\x35\x66\x35\x66\x31\x32\x33\x64\x64\x30\x63\
> > > > > x32\x38\x63\x39\x39\x34\x64\x38\x38\x31\x2E
> > > > > on host: slave5.Total scanned row: 100245548. Total filtered/aggred
> > > row:
> > > > 0.
> > > > > Time elapsed in EP: 107063(ms). Server CPU usage:
> > 0.21609751440119665,
> > > > > server physical mem left: 4.769349632E9, server swap mem
> > > > > left:8.131039232E9.Etc message: start latency: 17@1,agg done@72715
> > > > > ,compress
> > > > > done@107063,server stats done@107063,
> > > > > debugGitTag:cf4d2940b67d622eacd2ac9a913b221091a35c2e;.Normal
> > Complete:
> > > > > true.
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>

Reply via email to