vinothchandar commented on issue #1328: Hudi upsert hangs
URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-585991857
 
 
   There must be something else going on.. just used my own benchmark jobs to 
generate a pattern where the records are fully overwritten in a second (and a 
third) batch and it actually finishes fine.. 
   
   ```
   hudi:hoodie_benchmark->connect --path 
file:///tmp/hudi-benchmark/output/org.apache.hudi
   35394 [Spring Shell] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from file:///tmp/hudi-benchmark/output/org.apache.hudi
   35415 [Spring Shell] INFO  org.apache.hudi.common.util.FSUtils  - Hadoop 
Configuration: fs.defaultFS: [file:///], Config:[Configuration: 
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: 
[org.apache.hadoop.fs.LocalFileSystem@6851d345]
   35416 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  - 
Loading table properties from 
file:/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/hoodie.properties
   35416 [Spring Shell] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type COPY_ON_WRITE(version=1) from 
file:///tmp/hudi-benchmark/output/org.apache.hudi
   Metadata for table hoodie_benchmark loaded
   hudi:hoodie_benchmark->commits show 
   36774 [Spring Shell] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants 
[[20200213134159__clean__COMPLETED], [20200213134159__commit__COMPLETED], 
[20200213134410__clean__COMPLETED], [20200213134410__commit__COMPLETED], 
[20200213134548__clean__COMPLETED], [20200213134548__commit__COMPLETED]]
   
╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
   ║ CommitTime     │ Total Bytes Written │ Total Files Added │ Total Files 
Updated │ Total Partitions Written │ Total Records Written │ Total Update 
Records Written │ Total Errors ║
   
╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
   ║ 20200213134548 │ 384.8 MB            │ 0                 │ 34              
    │ 3                        │ 4080024               │ 1211376                
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213134410 │ 379.9 MB            │ 0                 │ 34              
    │ 3                        │ 4040016               │ 1199234                
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213134159 │ 374.8 MB            │ 34                │ 0               
    │ 3                        │ 4000008               │ 0                      
      │ 0            ║
   
╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
   
   hudi:hoodie_benchmark->
   ```
   
   and the times below in ms
   
   ```
    grep -n -e totalCreateTime -e totalUpsertTime  
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/*.commit 
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134159.commit:697:  
"totalCreateTime" : 195060,
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134159.commit:698:  
"totalUpsertTime" : 0,
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134410.commit:697:  
"totalCreateTime" : 0,
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134410.commit:698:  
"totalUpsertTime" : 193693,
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134548.commit:697:  
"totalCreateTime" : 0,
   
/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134548.commit:698:  
"totalUpsertTime" : 182277,
   ```
   
   
   Can we drill into your dataset?  are you generating tons of files due to 
granular partitionining? can you share the spark UI and the hudi cli output 
like above?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to