hudi-bot opened a new issue, #16297:
URL: https://github.com/apache/hudi/issues/16297

   There are some incorrect replace operation to sort all partition paths.
   {code:java}
   return allPartitionPaths.stream().map(partition -> partition.replace("/", 
"-"))
       .sorted(Comparator.reverseOrder()).map(partitionPath -> 
partitionPath.replace("-", "/")) {code}
   the hive partition before replace is dllr_date=2023-10-10, then after will 
convert to dllr_date=2023/10/10, this is an incorrect partition.
    # org.apache.hudi.table.action.compact.strategy.DayBasedCompactionStrategy
    # 
org.apache.hudi.table.action.compact.strategy.BoundedPartitionAwareCompactionStrategy
    # 
org.apache.hudi.table.action.compact.strategy.UnBoundedPartitionAwareCompactionStrategy
   
   !image-2023-11-08-16-02-39-291.png!
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-7051
   - Type: Bug
   - Attachment(s):
     - 08/Nov/23 
08:01;vmaster;image-2023-11-08-16-01-46-166.png;https://issues.apache.org/jira/secure/attachment/13064245/image-2023-11-08-16-01-46-166.png
     - 08/Nov/23 
08:02;vmaster;image-2023-11-08-16-02-39-291.png;https://issues.apache.org/jira/secure/attachment/13064244/image-2023-11-08-16-02-39-291.png
   
   
   ---
   
   
   ## Comments
   
   06/Dec/23 03:39;shivnarayan;hey [~vmaster] : 
   sorry I am bit confused. 
   
   as per master, filterPartitionPaths in DayBasedCompactionStrategy is as 
below 
   
   
   
    
   {code:java}
   @Override
   public List<String> filterPartitionPaths(HoodieWriteConfig writeConfig, 
List<String> allPartitionPaths) {
     return allPartitionPaths.stream().sorted(comparator)
         .collect(Collectors.toList()).subList(0, 
Math.min(allPartitionPaths.size(),
             writeConfig.getTargetPartitionsPerDayBasedCompaction()));
   } {code}
    
   
    
   
   Only in 
   
   BoundedPartitionAwareCompactionStrategy.filterPartitionPaths I see the 
replace operations. 
   
   But can you help me understand whats the issue in there. I understand 
"dllr_date=2023/10/10" may not be an actual partition present physcially, but 
thats interim state used for comparison and later we switch it back. 
   
    
   
   in other words. 
   
   if original partition is hypehnated. 
   
    
   
   dllr_date=2023-10-10 -> gets converted to "dllr_date=2023/10/10", and then 
comparisons are performed to sort them and then converted back to 
dllr_date=2023-10-10. So, not sure where is the bug here. can you throw some 
light please
   
    ;;;
   
   ---
   
   02/Jan/24 01:53;vmaster;[~shivnarayan] thanks for your reply, as you say, 
the class of DayBasedCompactionStrategy has been fixed by issue HUDI-6975, but 
problem still exists in follow class:
    # 
org.apache.hudi.table.action.compact.strategy.BoundedPartitionAwareCompactionStrategy
    # 
org.apache.hudi.table.action.compact.strategy.UnBoundedPartitionAwareCompactionStrategy
   
   if we have a hive partition like this: dllr_date=2023-10-10 ,the replace 
operation will result in an error, final get 'dllr_date=2023/10/10'
   
    
   {code:java}
   List<String> allPartitionPaths =
       partitionPaths.stream().map(partition -> partition.replace("/", 
"-")).sorted(Comparator.reverseOrder())
           .map(partitionPath -> partitionPath.replace("-", 
"/")).collect(Collectors.toList()); {code}
   in other words, there has second replace operation, but the first will has 
no effect, only the second replace works, as this point, there is no converted 
back.
   
    
   
    ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to