[ 
https://issues.apache.org/jira/browse/KYLIN-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245332#comment-17245332
 ] 

ASF GitHub Bot commented on KYLIN-4818:
---------------------------------------

hit-lacus edited a comment on pull request #1485:
URL: https://github.com/apache/kylin/pull/1485#issuecomment-740020477


   ## CuboidStatisticsJob Profile Flame Graph
   
   
   #### Tools
   Refer to 
https://www.linkedin.com/pulse/profiling-spark-applications-one-click-michael-spector
 .
   
   #### Prepare env
   - Hadoop env : HDP 2.4
   - Cube : KylinSales (10000 lines)
   - Commit : 2e13c8857700fd4d1c4e4daede6600562c62d494
   
   #### Kylin Conf
   
   ```properties
   
kylin.metadata.url=KYLIN_4818_1@jdbc,url=jdbc:mysql://10.1.3.90:3306/NightlyBuild,username=root,password=R00t@kylin,maxActive=10,maxIdle=10
   kylin.env.zookeeper-connect-string=cdh-master:2181
   kylin.env.zookeeper-base-path=/kylin/regression_testing/KYLIN-4818-1
   kylin.env.hdfs-working-dir=/kylin/regression_testing/KYLIN-4818-1
   kylin.source.hive.database-for-flat-table=regression_testing
   kylin.query.cache-enabled=false
   kylin.job.scheduler.default=100
   kylin.server.self-discovery-enabled=true
   kylin.spark-conf.auto.prior=false
   
   #kylin.cube.cubeplanner.enabled=false
   
   kylin.engine.spark-conf.spark.executor.memory=6g
   kylin.engine.spark-conf.spark.executor.memoryOverhead=1g
   kylin.engine.spark-conf.spark.executor.instances=1
   kylin.engine.spark-conf.spark.executor.cores=1
   kylin.engine.spark-cmd=/usr/local/bin/spark-submit-flamegraph
   kylin.cube.cubeplanner.enabled=true
   ```
   
   
   #### Task Statistics Tab of Spark UI
   
   <img width="1417" alt="image" 
src="https://user-images.githubusercontent.com/14030549/101375345-3ce3d900-38ea-11eb-9a47-1f6bd16963ce.png";>
   
   #### Executor Log
   ```sh
   LogType:stdout
   Log Upload Time:Mon Dec 07 16:00:58 +0000 2020
   LogLength:2920
   Log Contents:
   log4j: Trying to find [spark-executor-log4j.properties] using context 
classloader sun.misc.Launcher$AppClassLoader@18b4aac2.
   log4j: Using URL 
[file:/hadoop/yarn/local/usercache/root/appcache/application_1606276600681_1970/container_e09_1606276600681_1970_01_000002/spark-executor-log4j.properties]
 for automatic log4j configuration.
   log4j: Reading configuration from URL 
file:/hadoop/yarn/local/usercache/root/appcache/application_1606276600681_1970/container_e09_1606276600681_1970_01_000002/spark-executor-log4j.properties
   log4j: Parsing for [root] with value=[INFO,stderr].
   log4j: Level token is [INFO].
   log4j: Category root set to INFO
   log4j: Parsing appender named "stderr".
   log4j: Parsing layout options for "stderr".
   log4j: Setting property [conversionPattern] to [%d{ISO8601} %-5p [%t] %c{2} 
: %m%n].
   log4j: End of parsing for "stderr".
   log4j: Setting property [target] to [System.err].
   log4j: Parsed "stderr" options.
   log4j: Finished configuring.
   CuboidStatisticsJob-Init1-1607355948764
   CuboidStatisticsJob-Init2-1607355948998
   CuboidStatisticsJob-statisticsWithinPartition1-1607355949009
   [10002313,10000349,0,2012-12-14,88750,Consumer Electronics,Vehicle 
Electronics & GPS,Radar & Laser Detectors,1,2,FR,US,France,United 
States,Others,0,ANALYST,Beijing]
   [10004376,10000927,1,2012-08-28,175750,Home & Garden,Bedding,Blankets & 
Throws,0,5,IT,FR,Italy,France,Others,0,ANALYST,Beijing]
   [10006710,10000005,2,2012-02-16,148324,Phones,Mobile 
Accessories,CaseCoverSkins,0,1,JP,CN,Japan,China,ABIN,15,ADMIN,Shanghai]
   [10003717,10000209,3,2013-10-19,37831,Collectibles,Advertising,Merchandise & 
Memorabilia,4,3,GB,FR,United Kingdom,France,FP-non GTC,0,ANALYST,Beijing]
   [10006076,10000154,4,2012-10-22,140746,eBay Motors,Parts & 
Accessories,Vintage Car & Truck 
Parts,0,4,JP,FR,Japan,France,Others,100,ADMIN,Shanghai]
       Stats
      i   :5001
   meter1 :159
   meter2 :279412
   CuboidStatisticsJob-statisticsWithinPartition2-1607356229905
   CuboidStatisticsJob-Init1-1607356230853
   CuboidStatisticsJob-Init2-1607356231101
   CuboidStatisticsJob-statisticsWithinPartition1-1607356231101
   [10009393,10000949,5009,2012-09-06,51582,ClothinShoes & Accessories,Kids' 
ClothinShoes & Accs,Girls' Clothing (Sizes 4 & Up),2,4,US,DE,United 
States,Germany,FP-GTC,0,ADMIN,Shanghai]
   [10002759,10000199,5010,2012-01-18,20865,ClothinShoes & Accessories,Men's 
Clothing,Athletic Apparel,3,3,CN,FR,China,France,FP-GTC,0,ADMIN,Shanghai]
   [10004825,10000098,5011,2013-04-25,20485,Home & 
Garden,Furniture,Other,2,3,JP,JP,Japan,Japan,ABIN,0,ADMIN,Shanghai]
   [10005962,10000244,5012,2013-12-01,145970,Toys & Hobbies,Models & 
Kits,Automotive,5,4,JP,DE,Japan,Germany,FP-non GTC,0,ANALYST,Beijing]
   [10004074,10000541,5013,2013-09-04,24541,Sports MeCards & Fan Shop,Fan 
Apparel & Souvenirs,College-NCAA,2,2,FR,US,France,United 
States,Auction,0,ADMIN,Shanghai]
       Stats
      i   :4987
   meter1 :93
   meter2 :292977
   CuboidStatisticsJob-statisticsWithinPartition2-1607356524809
   End of LogType:stdout
   ```
   
   
   ### Flame graph 
   
   <img width="1196" alt="image" 
src="https://user-images.githubusercontent.com/14030549/101375552-7f0d1a80-38ea-11eb-9c4b-29c04531899a.png";>
   
   <img width="1188" alt="image" 
src="https://user-images.githubusercontent.com/14030549/101375669-a368f700-38ea-11eb-9f4d-ae6b5f57fece.png";>
   
   
   ### Summary
   From Spark UI, there are two task for CuboidStatisticsJob, first one has 
`5001` input records, and cost about 4.7 minutes, that means each row costs 
about **56.38** (4.7 * 60000 /5001) millseconds. 
   
   From executor log, And `meter2` is much larger than `meter1`. 
   
   From above flame graph indicate that `Long#toString` cost too much time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Calculate cuboid statistics in Kylin 4
> --------------------------------------
>
>                 Key: KYLIN-4818
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4818
>             Project: Kylin
>          Issue Type: Sub-task
>          Components: Spark Engine
>            Reporter: Xiaoxiang Yu
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v4.0.0-beta
>
>
> Refer to SparkFactDistinct.java in Kylin 3, I will try to use spark to 
> calculate(estimate) rowcount/size for cuboid candidate. Rowcount/size of 
> cuboid si the input for cubeplanner phase one and phase two.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to