[jira] [Commented] (KYLIN-4315) Use metadata numRows in beeline client for quick row counting

ASF GitHub Bot (Jira) Sun, 14 Jun 2020 02:32:33 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135096#comment-17135096
 ]


ASF GitHub Bot commented on KYLIN-4315:
---------------------------------------

RupengWang commented on pull request #1024:
URL: https://github.com/apache/kylin/pull/1024#issuecomment-643741849


   ### Check list
   - [x] Step 1. [**OPTIONAL**]Understand background/root cause/design 
basically(one hour EST) . Its issue type is 
       - [ ] bug fix
       - [ ] new feature
       - [x] enhancement
   - [x] Step 2. Test cases are designed and documented (30 minutes EST).
   - [x] Step 3. Prepare specific env, for example:
       - mock data (maybe some specific data type)
       - test env (maybe install a RDBMS instance).
   - [x] Step 4. Verify and make sure test cases passed.
   - [x] Step 5. Paste manual important steps and screenshots here (20 minutes 
EST). 
       - If you find difficulty in pick most important evidence, please attach 
diagnosis package. 
   - [x] Step 6. Do more check in user perspective (20 minutes EST)
       - [x] Doc need be updated? And if it is updated? Ask help for release 
manager if so.
       - [x] If it is a breaking change so we should notify Kylin community? 
Ask help for release manager if so.
   - [x] Step 7.  Summarize this test (20 minutes EST).
   
   ### Total estimate
   - 4 hours for small issue from optimistic estimate. 
   
   ### Note
   If you find some unexpected and **unrelated** error/mistake, please DO 
report it if it is truly a mistake, I think we may research and fix it in the 
future.
   
   If you find background information/root cause analysis is not complete or 
ambiguous, please try to contact to author or do a quick research and record 
what you find. It is a good chance to learn something interesting.
   
   If you find it is not easy to design testcase, please notify release manager.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Use metadata numRows in beeline client for quick row counting
> -------------------------------------------------------------
>
>                 Key: KYLIN-4315
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4315
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Congling Xia
>            Assignee: Congling Xia
>            Priority: Major
>             Fix For: v3.1.0
>
>
> Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses 
> "select count(*) from <tb_name>" for table row counting. The method is 
> invoked in flat intermediate table redistribution step in cube building.
> This stats can be loaded in metastore. It costs much less time than scanning 
> all rows in Hive table. Since intermediate tables are created and inserted by 
> Kylin, statistics will be automatically calculated and stored in metastore 
> when 
> `[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]`
>  is enabled (which is the default setting for Hive). 
> ref Hive wiki for more detail about `numRows` stats: 
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4315) Use metadata numRows in beeline client for quick row counting

Reply via email to