maheshguptags opened a new issue, #10456:
URL: https://github.com/apache/hudi/issues/10456

   I am trying to add second level of partition to my table instead of one 
level of partition but it is taking 10X time as compared to single level 
partition in hudi flink job.
   
   I tried to ingest 1.8M record into one level of partition and it took around 
12-15 Min to ingest all the data then with same configuration I just added 
another level of partition key with same data payload and it took around 1 hour 
45 Min to complete the process.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   below is the configuration that I am using for table. You can add the table 
creation statement with below properties. 
   
   ```
   PARTITIONED BY (`client_id`,`hashed_server_id`)
   WITH ('connector' = 'hudi','path' = '${table_location}',
   'table.type' = 'COPY_ON_WRITE',
   'hoodie.datasource.write.recordkey.field' = 'a,b',
   'payload.class'='x.y.PartialUpdate',
   'precombine.field'='ts',
   'hoodie.clean.async'='true',
   'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS',
   'hoodie.clean.automatic' = 'true',
   'hoodie.clean.max.commits'='5',
   'hoodie.clean.trigger.strategy'='NUM_COMMITS',
   'hoodie.cleaner.parallelism'='100',
   'hoodie.cleaner.commits.retained'='4',
   'hoodie.index.type'= 'BUCKET',
   'hoodie.index.bucket.engine' = 'SIMPLE',
   'hoodie.bucket.index.num.buckets'='16',
   'hoodie.bucket.index.hash.field'='a',
   'hoodie.parquet.small.file.limit'='104857600',
   'hoodie.parquet.compression.codec'='snappy')
   ``` 
   
   **Expected behavior**
   As it is just a partition addition to the storage it should not impact the 
performance much(I can understand if it takes 5-7 min extra as complexkey 
generation is bit slower than simplekey ). 
   
   **Environment Description**
   * Flink  1.17.1 
   * Hudi version : 14
   
   * Spark version : NA
   
   * Hive version : NA
   
   * Hadoop version : 3.4.0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) :Yes
   
   
   **Additional context**
   
   My table type is upsert and I have test the functionality and it is working 
fine and I cannot change the table type.
   
   I also discussed with @ad1happy2go and he also suggested that it wont impact 
much as it just a another level of partition.
   
   CC : @ad1happy2go @codope @danny0405 @yo
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to