bwu2 edited a comment on issue #1328: Hudi upsert hangs
URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-586071719
 
 
   Ok, thanks for this. 
   
   I have run the jobs again. First, insert 4m records, then upsert 3m of them, 
then upsert 4m, then upsert 4m. The two jobs upserting 3m records work fine and 
quickly, but the one where upsert 4m takes >200 times as long. There is no 
partitioning and only one (small) output file. 
   
   My results (from a synthetic dataset) are:
   ```bash
   hudi:json_data->commits show --limit 4
   
╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
   ║ CommitTime     │ Total Bytes Written │ Total Files Added │ Total Files 
Updated │ Total Partitions Written │ Total Records Written │ Total Update 
Records Written │ Total Errors ║
   
╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
   ║ 20200214013937 │ 25.5 MB             │ 0                 │ 1               
    │ 1                        │ 4000000               │ 3000000                
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224532 │ 25.5 MB             │ 0                 │ 1               
    │ 1                        │ 4000000               │ 4000000                
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224325 │ 25.6 MB             │ 0                 │ 1               
    │ 1                        │ 4000000               │ 3000000                
      │ 0            ║
   
╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224218 │ 25.5 MB             │ 1                 │ 0               
    │ 1                        │ 4000000               │ 0                      
      │ 0            ║
   
╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
   ```
   
   and the times:
   ```bash
   grep -n -e totalCreateTime -e totalUpsertTime  *.commit
   20200213224218.commit:36:  "totalCreateTime" : 30012,
   20200213224218.commit:37:  "totalUpsertTime" : 0,
   20200213224325.commit:36:  "totalCreateTime" : 0,
   20200213224325.commit:37:  "totalUpsertTime" : 46879,
   20200213224532.commit:36:  "totalCreateTime" : 0,
   20200213224532.commit:37:  "totalUpsertTime" : 10347280,
   20200214013937.commit:36:  "totalCreateTime" : 0,
   20200214013937.commit:37:  "totalUpsertTime" : 44598,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to