cdmikechen opened a new pull request #1122: [HUDI-29]: Support hudi COW table 
to use *ANALYZE TABLE table_name COMMPUTE STATISTICS* to get table current rows
URL: https://github.com/apache/incubator-hudi/pull/1122
 
 
   link https://issues.apache.org/jira/projects/HUDI/issues/HUDI-29
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   if use `ANALYZE TABLE table_name COMMPUTE STATISTICS` to get hudi table 
rows, hive will collect 
   all parquet file in table path. Now let hudi table to identify which files 
are the latest Hudi files, so that hive can get a right result for stats.
   
   ## Brief change log
   ```shell
    
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
            | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
    
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/NoneParquetRecordReaderWrapper.java
 | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    2 files changed, 124 insertions(+), 1 deletion(-)
    create mode 100644 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/NoneParquetRecordReaderWrapper.java
   ```
   
   ## Verify this pull request
   have test in `org.apache.hudi.hadoop.TestHoodieInputFormat` and `mvn clean 
package -DskipTests -DskipITs `.
   I had a hudi COW table with 750 rows and updated some times.
   ```shell
   hudi->connect --path /hive/warehouse/lims.db/lims_method
   19/12/23 10:09:10 INFO table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from /hive/warehouse/lims.db/lims_method
   19/12/23 10:09:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   19/12/23 10:09:11 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: 
[hdfs://bdcluster1:9000/], Config:[Configuration: core-default.xml, 
core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, 
yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: 
[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1269021288_12, ugi=hdfs 
(auth:SIMPLE)]]]
   19/12/23 10:09:11 INFO table.HoodieTableConfig: Loading dataset properties 
from /hive/warehouse/lims.db/lims_method/.hoodie/hoodie.properties
   19/12/23 10:09:11 INFO table.HoodieTableMetaClient: Finished Loading Table 
of type 
COPY_ON_WRITE(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) 
from /hive/warehouse/lims.db/lims_method
   Metadata for table lims_method loaded
   hudi:lims_method->commits show
   19/12/23 10:09:22 INFO timeline.HoodieActiveTimeline: Loaded instants 
[[20190801100644__clean__COMPLETED], [20190801100644__commit__COMPLETED], 
[20190807152831__clean__COMPLETED], [20190807152831__commit__COMPLETED], 
[20190807153023__clean__COMPLETED], [20190807153023__commit__COMPLETED], 
[20190808160401__clean__COMPLETED], [20190808160401__commit__COMPLETED], 
[20190924090925__clean__COMPLETED], [20190924090925__commit__COMPLETED], 
[20190924092639__clean__COMPLETED], [20190924092639__commit__COMPLETED], 
[20191104150324__clean__COMPLETED], [20191104150324__commit__COMPLETED], 
[20191104150629__clean__COMPLETED], [20191104150629__commit__COMPLETED], 
[20191104165039__clean__COMPLETED], [20191104165039__commit__COMPLETED]]
   
╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
   ║ CommitTime     │ Total Bytes Written │ Total Files Added │ Total Files 
Updated │ Total Partitions Written │ Total Records Written │ Total Update 
Records Written │ Total Errors ║
   
╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
   ║ 20191104165039 │ 457.4 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20191104150629 │ 457.4 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20191104150324 │ 457.4 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190924092639 │ 457.3 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 2                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190924090925 │ 457.3 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190808160401 │ 457.2 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190807153023 │ 457.1 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190807152831 │ 457.1 KB            │ 0                 │ 1               
    │ 1                        │ 750                   │ 1                      
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20190801100644 │ 457.2 KB            │ 1                 │ 0               
    │ 1                        │ 750                   │ 0                      
      │ 0            ║
   
╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
   ```
   I use `ANALYZE TABLE` command in hive(it is based on Tez and Mr, I both test 
before, now in Tez) beeline and test `select count(*)`
   ```
   0: jdbc:hive2://localhost:10000> ANALYZE TABLE lims.lims_method COMPUTE 
STATISTICS;
   No rows affected (4.569 seconds)
   0: jdbc:hive2://localhost:10000> select count(1) from lims.lims_method;
   +------+
   | _c0  |
   +------+
   | 750  |
   +------+
   1 row selected (0.632 seconds)
   ```
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to