Hi An Lan, Please confirm below things.
1. Is dynamic executor is enabled?? If it is enabled can u disabled and try(this is to check is there any impact with dynamic executor) for disabling dynamic executor you need set in spark-default.conf --num-executors=100 2. Can u please set in below property enable.blocklet.distribution=false and execute the query. 3. Cardinality of each column. 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ?? 5. Can u keep all the columns which is present in filter on left side so less no of blocks will identified during block pruning and it will improve the query performance. -Regards Kumar Vishal On Fri, Nov 11, 2016 at 11:59 AM, An Lan <lanan...@gmail.com> wrote: > Hi Kumar Vishal, > > 1. Create table ddl: > > CREATE TABLE IF NOT EXISTS Table1 > > (* h Int, g Int, d String, f Int, e Int,* > > a Int, b Int, …(extra near 300 columns) > > STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES( > > "NO_INVERTED_INDEX”=“a”, > "NO_INVERTED_INDEX”=“b”, > > …(extra near 300 columns) > > "DICTIONARY_INCLUDE”=“a”, > > "DICTIONARY_INCLUDE”=“b”, > > …(extra near 300 columns) > > ) > > 2. 3. There more than hundreds node in the cluster, but cluster is > used mixed with other application. Some time when node is enough, we will > get 100 distinct node. > > 4. I give a statistic of task time during once query and mark > distinct nodes below: > > [image: 内嵌图片 1] > > > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <kumarvishal1...@gmail.com>: > >> Hi Anning Luo, >> >> Can u please provide below details. >> >> 1.Create table ddl. >> 2.Number of node in you cluster setup. >> 3. Number of executors per node. >> 4. Query statistics. >> >> Please find my comments in bold. >> >> Problem: >> 1. GC problem. We suffer a 20%~30% GC time for >> some task in first stage after a lot of parameter refinement. We now use >> G1 >> GC in java8. GC time will double if use CMS. The main GC time is spent on >> young generation GC. Almost half memory of young generation will be copy >> to >> old generation. It seems lots of object has a long life than GC period and >> the space is not be reuse(as concurrent GC will release it later). When we >> use a large Eden(>=1G for example), once GC time will be seconds. If set >> Eden little(256M for example), once GC time will be hundreds milliseconds, >> but more frequency and total is still seconds. Is there any way to lessen >> the GC time? (We don’t consider the first query and second query in this >> case.) >> >> *How many node are present in your cluster setup?? If nodes are less >> please >> reduce the number of executors per node.* >> >> 2. Performance refine problem. Row number after >> being filtered is not uniform. Some node maybe heavy. It spend more time >> than other node. The time of one task is 4s ~ 16s. Is any method to refine >> it? >> >> 3. Too long time for first and second query. I >> know dictionary and some index need to be loaded for the first time. But >> after I trying use query below to preheat it, it still spend a lot of >> time. >> How could I preheat the query correctly? >> select Aarray, a, b, c… from Table1 where Aarray >> is >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 >> >> *Currently we are working on first time query improvement. For now you can >> run select count(*) or count(column), so all the blocks get loaded and >> then >> you can run the actual query.* >> >> >> 4. Any other suggestion to lessen the query time? >> >> >> Some suggestion: >> The log by class QueryStatisticsRecorder give me a good means >> to find the neck bottle, but not enough. There still some metric I think >> is >> very useful: >> 1. filter ratio. i.e.. not only result_size but also the >> origin >> size so we could know how many data is filtered. >> 2. IO time. The scan_blocks_time is not enough. If it is high, >> we know somethings wrong, but not know what cause that problem. The real >> IO >> time for data is not be provided. As there may be several file for one >> partition, know the program slow is caused by datanode or executor itself >> give us intuition to find the problem. >> 3. The TableBlockInfo for task. I log it by myself when >> debugging. It tell me how many blocklets is locality. The spark web >> monitor >> just give a locality level, but may be only one blocklet is locality. >> >> >> -Regards >> Kumar Vishal >> >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <lanan...@gmail.com> wrote: >> >> > Hi, >> > >> > We are using carbondata to build our table and running query in >> > CarbonContext. We have some performance problem during refining the >> system. >> > >> > *Background*: >> > >> > *cluster*: 100 executor,5 task/executor, 10G >> > memory/executor >> > >> > *data*: 60+GB(per one replica) as carbon >> data >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows >> > >> > *sql example:* >> > >> > select A, >> > >> > sum(a), >> > >> > sum(b), >> > >> > sum(c), >> > >> > …( extra 100 aggregation like >> > sum(column)) >> > >> > from Table1 LATERAL VIEW >> > explode(split(Aarray, ‘*;*’)) ATable AS A >> > >> > where A is not null and d > >> “ab:c-10” >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A >> > >> > *target query time*: <10s >> > >> > *current query time*: 15s ~ 25s >> > >> > *scene:* OLAP system. <100 queries every >> day. >> > Concurrency number is not high. Most time cpu is idle, so this service >> will >> > run with other program. The service will run for long time. We could not >> > occupy a very large memory for every executor. >> > >> > *refine*: I have build index and >> dictionary on >> > d, e, f, g, h and build dictionary on all other aggregation >> columns(i.e. a, >> > b, c, …100+ columns). And make sure there is one segment for total >> data. I >> > have open the speculation(quantile=0.5, interval=250, multiplier=1.2). >> > >> > Time is mainly spent on first stage before shuffling. As 95% data will >> be >> > filtered out, the shuffle process spend little time. In first stage, >> most >> > task complete in less than 10s. But there still be near 50 tasks longer >> > than 10s. Max task time in one query may be 12~16s. >> > >> > *Problem:* >> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in >> first >> > stage after a lot of parameter refinement. We now use G1 GC in java8. GC >> > time will double if use CMS. The main GC time is spent on young >> generation >> > GC. Almost half memory of young generation will be copy to old >> generation. >> > It seems lots of object has a long life than GC period and the space is >> not >> > be reuse(as concurrent GC will release it later). When we use a large >> > Eden(>=1G for example), once GC time will be seconds. If set Eden >> > little(256M for example), once GC time will be hundreds milliseconds, >> but >> > more frequency and total is still seconds. Is there any way to lessen >> the >> > GC time? (We don’t consider the first query and second query in this >> case.) >> > >> > 2. Performance refine problem. Row number after being >> filtered is >> > not uniform. Some node maybe heavy. It spend more time than other node. >> The >> > time of one task is 4s ~ 16s. Is any method to refine it? >> > >> > 3. Too long time for first and second query. I know dictionary >> > and some index need to be loaded for the first time. But after I trying >> use >> > query below to preheat it, it still spend a lot of time. How could I >> > preheat the query correctly? >> > >> > select Aarray, a, b, c… from Table1 where Aarray is not >> null >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 >> > >> > 4. Any other suggestion to lessen the query time? >> > >> > >> > >> > Some suggestion: >> > >> > The log by class QueryStatisticsRecorder give me a good >> means >> > to find the neck bottle, but not enough. There still some metric I >> think is >> > very useful: >> > >> > 1. filter ratio. i.e.. not only result_size but also the >> origin >> > size so we could know how many data is filtered. >> > >> > 2. IO time. The scan_blocks_time is not enough. If it is >> high, >> > we know somethings wrong, but not know what cause that problem. The >> real IO >> > time for data is not be provided. As there may be several file for one >> > partition, know the program slow is caused by datanode or executor >> itself >> > give us intuition to find the problem. >> > >> > 3. The TableBlockInfo for task. I log it by myself when >> > debugging. It tell me how many blocklets is locality. The spark web >> monitor >> > just give a locality level, but may be only one blocklet is locality. >> > >> > >> > --------------------- >> > >> > Anning Luo >> > >> > *HULU* >> > >> > Email: anning....@hulu.com >> > >> > lanan...@gmail.com >> > >> > >