Hi An Lan, For detail logging you need to add log4j configuration in spark-default.conf for both driver and executor. * spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<filepath>* * spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<filepath>*
Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO* -Regards Kumar Vishal On Wed, Nov 16, 2016 at 2:06 PM, An Lan <lanan...@gmail.com> wrote: > Hi Kumar Vishal, > > Thanks for your suggestion. > The driver log not contain block distribution log by default. How could I > open it? > And how does the order of the dimensions be decided? > > 2016-11-16 15:14 GMT+08:00 Kumar Vishal <kumarvishal1...@gmail.com>: > > > Hi An Lan, > > Data is already distributed, in this case may be one > > blocklet is returning more number of rows and other returning less > because > > of this some task will take more time. > > > > In driver log block distribution log is not present, so it is not clear > > whether it is going for block distribution or blocklet distribution. > > distribution, can you please add the detail driver log. > > > > *Some suggestion:* > > > > 1. If column is String and you are not applying filter then in create > > statement add it in dictionary exclude this will avoid lookup in > > dictionary. If number of String column are less then better add it in > > dictionary exclude. > > > > 2. If column is a numeric column and you are not applying filter then no > > need to add in dictionary include or exclude, so it will be a measure > > column. > > > > 3. Currently in carbon for measure column if you will create as a double > > data type it will give more compression as currently value compression > mode > > is support for double data type. > > > > -Regards > > Kumar Vishal > > > > > > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <lanan...@gmail.com> wrote: > > > > > Hi Kumar Vishal, > > > > > > 1. I found the quantity of rows filtered out by invert index is not > > > uniform between different tasks and the difference is large. Some task > > may > > > be 3~4k row after filtered, but the longer tasks may be 3~4w. When most > > > longer task on same node, time cost will be more longer than others. > So, > > is > > > there any way to balance the data and make rows distribute uniform > > between > > > task? > > > Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every > > > task has 6 blocket. > > > 2. I upload some logs last time. Is there some access problem about it? > > Is > > > there any suggest or question about it? > > > > > > 2016-11-14 11:29 GMT+08:00 An Lan <lanan...@gmail.com>: > > > > > >> Hi Kumar Vishal, > > >> > > >> > > >> Driver and some executor logs are in the accessory. The same query run > > >> for five times. > > >> > > >> > > >> > > >> Time consume for every query: > > >> > > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms > > >> > > >> > > >> The first stage for every query: > > >> > > >> (first query -> start from 0 stage > > >> > > >> second query -> 4 > > >> > > >> third query -> 8 > > >> > > >> fourth query -> 12 > > >> > > >> five query -> 16) > > >> > > >> see "first stage of queries.png" in accessory > > >> > > >> > > >> > > >> Longest task for every query: > > >> > > >> (Task and log relationship: > > >> > > >> stage 0, task 249 -> 1.log > > >> > > >> stage 4, task 420 -> 2.log > > >> > > >> stage 8, task 195 -> 3.log > > >> > > >> stage 12, task 350 -> 4.log > > >> > > >> stage 16, task 321 -> 5.log) > > >> see "longest task.png" in accessory > > >> > > >> > > >> longest tasks.png > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/ > > view?usp=drive_web> > > >> > > >> first stage of queries.png > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/ > > view?usp=drive_web> > > >> > > >> dirverlog > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/ > > view?usp=drive_web> > > >> > > >> 1.log > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/ > > view?usp=drive_web> > > >> > > >> 2.log > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/ > > view?usp=drive_web> > > >> > > >> 3.log > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/ > > view?usp=drive_web> > > >> > > >> 4.log > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/ > > view?usp=drive_web> > > >> > > >> 5.log > > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/ > > view?usp=drive_web> > > >> > > >> > > >> 2016-11-14 11:22 GMT+08:00 An Lan <lanan...@gmail.com>: > > >> > > >>> Hi, > > >>> > > >>> > > >>> > > >>> Driver and some executor logs are in the accessory. The same query > run > > >>> for five times. > > >>> > > >>> > > >>> > > >>> Time consume for every query: > > >>> > > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms > > >>> > > >>> > > >>> > > >>> The first stage for every query: > > >>> > > >>> (first query -> start from 0 stage > > >>> > > >>> second query -> 4 > > >>> > > >>> third query -> 8 > > >>> > > >>> fourth query -> 12 > > >>> > > >>> five query -> 16) > > >>> > > >>> > > >>> Longest task for every query: > > >>> > > >>> (Task and log relationship: > > >>> > > >>> stage 0, task 249 -> 1.log > > >>> > > >>> stage 4, task 420 -> 2.log > > >>> > > >>> stage 8, task 195 -> 3.log > > >>> > > >>> stage 12, task 350 -> 4.log > > >>> > > >>> stage 16, task 321 -> 5.log) > > >>> > > >>> > > >>> > > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <kumarvishal1...@gmail.com>: > > >>> > > >>>> What is the time difference between first time and second time query > > as > > >>>> second time it will read from os cache,so i think there wont be any > IO > > >>>> bottleneck. > > >>>> > > >>>> Can u provide driver log and executor log for task which are taking > > more > > >>>> time. > > >>>> > > >>>> > > >>>> -Regards > > >>>> Kumar Vishal > > >>>> > > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <lanan...@gmail.com> wrote: > > >>>> > > >>>> > 1. I have set --num-executors=100 in all test. The image write 100 > > >>>> nodes > > >>>> > means one executor for one node, and when there is 64 node, there > > >>>> maybe two > > >>>> > or three executor on one node. The total executor number is always > > >>>> 100. > > >>>> > > > >>>> > 2. I do not find the property enable.blocklet.distribution in > > 0.1.1. I > > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later. > If > > >>>> you > > >>>> > concern locality level, it seems that the most long query is not > > >>>> caused by > > >>>> > IO problem. Statistic of the long task statistic like this: > > >>>> > > > >>>> > +--------------------+----------------+--------------------+ > > >>>> > ----------------+---------------+-----------+-------------------+ > > >>>> > | task_id|load_blocks_time|load > > >>>> _dictionary_time|scan_blocks_ > > >>>> > time|scan_blocks_num|result_size|total_executor_time| > > >>>> > +--------------------+----------------+--------------------+ > > >>>> > ----------------+---------------+-----------+-------------------+ > > >>>> > |4199754456147478_134| 1 | 32 | > > >>>> > 3306 | 3 | 42104 | 13029 | > > >>>> > +--------------------+----------------+--------------------+ > > >>>> > ----------------+---------------+-----------+-------------------+ > > >>>> > > > >>>> > I have suffer from IO, but after configure speculation, most tasks > > >>>> > with a long IO will be resend. > > >>>> > > > >>>> > > > >>>> > 3. distinct value: > > >>>> > > > >>>> > g: 11 > > >>>> > > > >>>> > h: 3 > > >>>> > > > >>>> > f: 7 > > >>>> > > > >>>> > e: 3 > > >>>> > > > >>>> > d: 281 > > >>>> > > > >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter, > others > > >>>> are > > >>>> > not used. So I think others used in the aggregation are no need > > added > > >>>> for > > >>>> > index. > > >>>> > > > >>>> > 5. I have write the e, f, g, h, d in most left just after "create > > >>>> table..." > > >>>> > > > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal < > kumarvishal1...@gmail.com > > >: > > >>>> > > > >>>> > > Hi An Lan, > > >>>> > > > > >>>> > > Please confirm below things. > > >>>> > > > > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u > > disabled > > >>>> and > > >>>> > > try(this is to check is there any impact with dynamic executor) > > >>>> > > for disabling dynamic executor you need set in > > >>>> spark-default.conf > > >>>> > > --num-executors=100 > > >>>> > > > > >>>> > > 2. Can u please set in below property > > enable.blocklet.distribution=f > > >>>> alse > > >>>> > > and execute the query. > > >>>> > > > > >>>> > > 3. Cardinality of each column. > > >>>> > > > > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ?? > > >>>> > > > > >>>> > > 5. Can u keep all the columns which is present in filter on left > > >>>> side so > > >>>> > > less no of blocks will identified during block pruning and it > will > > >>>> > improve > > >>>> > > the query performance. > > >>>> > > > > >>>> > > > > >>>> > > -Regards > > >>>> > > Kumar Vishal > > >>>> > > > > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <lanan...@gmail.com> > > >>>> wrote: > > >>>> > > > > >>>> > > > Hi Kumar Vishal, > > >>>> > > > > > >>>> > > > 1. Create table ddl: > > >>>> > > > > > >>>> > > > CREATE TABLE IF NOT EXISTS Table1 > > >>>> > > > > > >>>> > > > (* h Int, g Int, d String, f Int, e Int,* > > >>>> > > > > > >>>> > > > a Int, b Int, …(extra near 300 columns) > > >>>> > > > > > >>>> > > > STORED BY 'org.apache.carbondata.format' > > >>>> > > > TBLPROPERTIES( > > >>>> > > > > > >>>> > > > "NO_INVERTED_INDEX”=“a”, > > >>>> > > > "NO_INVERTED_INDEX”=“b”, > > >>>> > > > > > >>>> > > > …(extra near 300 columns) > > >>>> > > > > > >>>> > > > "DICTIONARY_INCLUDE”=“a”, > > >>>> > > > > > >>>> > > > "DICTIONARY_INCLUDE”=“b”, > > >>>> > > > > > >>>> > > > …(extra near 300 columns) > > >>>> > > > > > >>>> > > > ) > > >>>> > > > > > >>>> > > > 2. 3. There more than hundreds node in the cluster, but > > >>>> cluster > > >>>> > is > > >>>> > > > used mixed with other application. Some time when node is > > >>>> enough, we > > >>>> > > will > > >>>> > > > get 100 distinct node. > > >>>> > > > > > >>>> > > > 4. I give a statistic of task time during once query and > > mark > > >>>> > > > distinct nodes below: > > >>>> > > > > > >>>> > > > [image: 内嵌图片 1] > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal < > > >>>> kumarvishal1...@gmail.com>: > > >>>> > > > > > >>>> > > >> Hi Anning Luo, > > >>>> > > >> > > >>>> > > >> Can u please provide below details. > > >>>> > > >> > > >>>> > > >> 1.Create table ddl. > > >>>> > > >> 2.Number of node in you cluster setup. > > >>>> > > >> 3. Number of executors per node. > > >>>> > > >> 4. Query statistics. > > >>>> > > >> > > >>>> > > >> Please find my comments in bold. > > >>>> > > >> > > >>>> > > >> Problem: > > >>>> > > >> 1. GC problem. We suffer a 20%~30% > GC > > >>>> time > > >>>> > for > > >>>> > > >> some task in first stage after a lot of parameter refinement. > > We > > >>>> now > > >>>> > use > > >>>> > > >> G1 > > >>>> > > >> GC in java8. GC time will double if use CMS. The main GC time > > is > > >>>> spent > > >>>> > > on > > >>>> > > >> young generation GC. Almost half memory of young generation > > will > > >>>> be > > >>>> > copy > > >>>> > > >> to > > >>>> > > >> old generation. It seems lots of object has a long life than > GC > > >>>> period > > >>>> > > and > > >>>> > > >> the space is not be reuse(as concurrent GC will release it > > >>>> later). > > >>>> > When > > >>>> > > we > > >>>> > > >> use a large Eden(>=1G for example), once GC time will be > > >>>> seconds. If > > >>>> > set > > >>>> > > >> Eden little(256M for example), once GC time will be hundreds > > >>>> > > milliseconds, > > >>>> > > >> but more frequency and total is still seconds. Is there any > way > > >>>> to > > >>>> > > lessen > > >>>> > > >> the GC time? (We don’t consider the first query and second > > query > > >>>> in > > >>>> > this > > >>>> > > >> case.) > > >>>> > > >> > > >>>> > > >> *How many node are present in your cluster setup?? If nodes > are > > >>>> less > > >>>> > > >> please > > >>>> > > >> reduce the number of executors per node.* > > >>>> > > >> > > >>>> > > >> 2. Performance refine problem. Row > > >>>> number > > >>>> > after > > >>>> > > >> being filtered is not uniform. Some node maybe heavy. It > spend > > >>>> more > > >>>> > time > > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any > > method > > >>>> to > > >>>> > > refine > > >>>> > > >> it? > > >>>> > > >> > > >>>> > > >> 3. Too long time for first and > second > > >>>> query. > > >>>> > I > > >>>> > > >> know dictionary and some index need to be loaded for the > first > > >>>> time. > > >>>> > But > > >>>> > > >> after I trying use query below to preheat it, it still spend > a > > >>>> lot of > > >>>> > > >> time. > > >>>> > > >> How could I preheat the query correctly? > > >>>> > > >> select Aarray, a, b, c… from Table1 > > where > > >>>> > Aarray > > >>>> > > >> is > > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and > h = > > >>>> 55 > > >>>> > > >> > > >>>> > > >> *Currently we are working on first time query improvement. > For > > >>>> now you > > >>>> > > can > > >>>> > > >> run select count(*) or count(column), so all the blocks get > > >>>> loaded and > > >>>> > > >> then > > >>>> > > >> you can run the actual query.* > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> 4. Any other suggestion to lessen the query time? > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> Some suggestion: > > >>>> > > >> The log by class QueryStatisticsRecorder give me > a > > >>>> good > > >>>> > > means > > >>>> > > >> to find the neck bottle, but not enough. There still some > > metric > > >>>> I > > >>>> > think > > >>>> > > >> is > > >>>> > > >> very useful: > > >>>> > > >> 1. filter ratio. i.e.. not only result_size but > > also > > >>>> the > > >>>> > > >> origin > > >>>> > > >> size so we could know how many data is filtered. > > >>>> > > >> 2. IO time. The scan_blocks_time is not enough. > If > > >>>> it is > > >>>> > > high, > > >>>> > > >> we know somethings wrong, but not know what cause that > problem. > > >>>> The > > >>>> > real > > >>>> > > >> IO > > >>>> > > >> time for data is not be provided. As there may be several > file > > >>>> for one > > >>>> > > >> partition, know the program slow is caused by datanode or > > >>>> executor > > >>>> > > itself > > >>>> > > >> give us intuition to find the problem. > > >>>> > > >> 3. The TableBlockInfo for task. I log it by > myself > > >>>> when > > >>>> > > >> debugging. It tell me how many blocklets is locality. The > spark > > >>>> web > > >>>> > > >> monitor > > >>>> > > >> just give a locality level, but may be only one blocklet is > > >>>> locality. > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> -Regards > > >>>> > > >> Kumar Vishal > > >>>> > > >> > > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <lanan...@gmail.com> > > >>>> wrote: > > >>>> > > >> > > >>>> > > >> > Hi, > > >>>> > > >> > > > >>>> > > >> > We are using carbondata to build our table and running > query > > in > > >>>> > > >> > CarbonContext. We have some performance problem during > > >>>> refining the > > >>>> > > >> system. > > >>>> > > >> > > > >>>> > > >> > *Background*: > > >>>> > > >> > > > >>>> > > >> > *cluster*: 100 executor,5 > task/executor, > > >>>> 10G > > >>>> > > >> > memory/executor > > >>>> > > >> > > > >>>> > > >> > *data*: 60+GB(per one replica) > > as > > >>>> > carbon > > >>>> > > >> data > > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million > rows > > >>>> > > >> > > > >>>> > > >> > *sql example:* > > >>>> > > >> > > > >>>> > > >> > select A, > > >>>> > > >> > > > >>>> > > >> > sum(a), > > >>>> > > >> > > > >>>> > > >> > sum(b), > > >>>> > > >> > > > >>>> > > >> > sum(c), > > >>>> > > >> > > > >>>> > > >> > …( extra 100 > > aggregation > > >>>> like > > >>>> > > >> > sum(column)) > > >>>> > > >> > > > >>>> > > >> > from Table1 LATERAL > > VIEW > > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A > > >>>> > > >> > > > >>>> > > >> > where A is not null > and > > >>>> d > > > >>>> > > >> “ab:c-10” > > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP > > BY > > >>>> A > > >>>> > > >> > > > >>>> > > >> > *target query time*: <10s > > >>>> > > >> > > > >>>> > > >> > *current query time*: 15s ~ 25s > > >>>> > > >> > > > >>>> > > >> > *scene:* OLAP system. <100 > > queries > > >>>> every > > >>>> > > >> day. > > >>>> > > >> > Concurrency number is not high. Most time cpu is idle, so > > this > > >>>> > service > > >>>> > > >> will > > >>>> > > >> > run with other program. The service will run for long time. > > We > > >>>> could > > >>>> > > not > > >>>> > > >> > occupy a very large memory for every executor. > > >>>> > > >> > > > >>>> > > >> > *refine*: I have build index > and > > >>>> > > >> dictionary on > > >>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation > > >>>> > > >> columns(i.e. a, > > >>>> > > >> > b, c, …100+ columns). And make sure there is one segment > for > > >>>> total > > >>>> > > >> data. I > > >>>> > > >> > have open the speculation(quantile=0.5, interval=250, > > >>>> > multiplier=1.2). > > >>>> > > >> > > > >>>> > > >> > Time is mainly spent on first stage before shuffling. As > 95% > > >>>> data > > >>>> > will > > >>>> > > >> be > > >>>> > > >> > filtered out, the shuffle process spend little time. In > first > > >>>> stage, > > >>>> > > >> most > > >>>> > > >> > task complete in less than 10s. But there still be near 50 > > >>>> tasks > > >>>> > > longer > > >>>> > > >> > than 10s. Max task time in one query may be 12~16s. > > >>>> > > >> > > > >>>> > > >> > *Problem:* > > >>>> > > >> > > > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for > some > > >>>> task in > > >>>> > > >> first > > >>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC > > in > > >>>> > java8. > > >>>> > > GC > > >>>> > > >> > time will double if use CMS. The main GC time is spent on > > young > > >>>> > > >> generation > > >>>> > > >> > GC. Almost half memory of young generation will be copy to > > old > > >>>> > > >> generation. > > >>>> > > >> > It seems lots of object has a long life than GC period and > > the > > >>>> space > > >>>> > > is > > >>>> > > >> not > > >>>> > > >> > be reuse(as concurrent GC will release it later). When we > > use a > > >>>> > large > > >>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If > set > > >>>> Eden > > >>>> > > >> > little(256M for example), once GC time will be hundreds > > >>>> > milliseconds, > > >>>> > > >> but > > >>>> > > >> > more frequency and total is still seconds. Is there any way > > to > > >>>> > lessen > > >>>> > > >> the > > >>>> > > >> > GC time? (We don’t consider the first query and second > query > > >>>> in this > > >>>> > > >> case.) > > >>>> > > >> > > > >>>> > > >> > 2. Performance refine problem. Row number after > > being > > >>>> > > >> filtered is > > >>>> > > >> > not uniform. Some node maybe heavy. It spend more time than > > >>>> other > > >>>> > > node. > > >>>> > > >> The > > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it? > > >>>> > > >> > > > >>>> > > >> > 3. Too long time for first and second query. I > know > > >>>> > > dictionary > > >>>> > > >> > and some index need to be loaded for the first time. But > > after > > >>>> I > > >>>> > > trying > > >>>> > > >> use > > >>>> > > >> > query below to preheat it, it still spend a lot of time. > How > > >>>> could I > > >>>> > > >> > preheat the query correctly? > > >>>> > > >> > > > >>>> > > >> > select Aarray, a, b, c… from Table1 where > > Aarray > > >>>> is > > >>>> > not > > >>>> > > >> null > > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 > > >>>> > > >> > > > >>>> > > >> > 4. Any other suggestion to lessen the query time? > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > Some suggestion: > > >>>> > > >> > > > >>>> > > >> > The log by class QueryStatisticsRecorder give > me > > a > > >>>> good > > >>>> > > >> means > > >>>> > > >> > to find the neck bottle, but not enough. There still some > > >>>> metric I > > >>>> > > >> think is > > >>>> > > >> > very useful: > > >>>> > > >> > > > >>>> > > >> > 1. filter ratio. i.e.. not only result_size but > > >>>> also the > > >>>> > > >> origin > > >>>> > > >> > size so we could know how many data is filtered. > > >>>> > > >> > > > >>>> > > >> > 2. IO time. The scan_blocks_time is not enough. > > If > > >>>> it is > > >>>> > > >> high, > > >>>> > > >> > we know somethings wrong, but not know what cause that > > >>>> problem. The > > >>>> > > >> real IO > > >>>> > > >> > time for data is not be provided. As there may be several > > file > > >>>> for > > >>>> > one > > >>>> > > >> > partition, know the program slow is caused by datanode or > > >>>> executor > > >>>> > > >> itself > > >>>> > > >> > give us intuition to find the problem. > > >>>> > > >> > > > >>>> > > >> > 3. The TableBlockInfo for task. I log it by > > myself > > >>>> when > > >>>> > > >> > debugging. It tell me how many blocklets is locality. The > > >>>> spark web > > >>>> > > >> monitor > > >>>> > > >> > just give a locality level, but may be only one blocklet is > > >>>> > locality. > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > --------------------- > > >>>> > > >> > > > >>>> > > >> > Anning Luo > > >>>> > > >> > > > >>>> > > >> > *HULU* > > >>>> > > >> > > > >>>> > > >> > Email: anning....@hulu.com > > >>>> > > >> > > > >>>> > > >> > lanan...@gmail.com > > >>>> > > >> > > > >>>> > > >> > > >>>> > > > > > >>>> > > > > > >>>> > > > > >>>> > > > >>>> > > >>> > > >>> > > >> > > > > > >