hudi-bot opened a new issue, #16297:
URL: https://github.com/apache/hudi/issues/16297
There are some incorrect replace operation to sort all partition paths.
{code:java}
return allPartitionPaths.stream().map(partition -> partition.replace("/",
"-"))
.sorted(Comparator.reverseOrder()).map(partitionPath ->
partitionPath.replace("-", "/")) {code}
the hive partition before replace is dllr_date=2023-10-10, then after will
convert to dllr_date=2023/10/10, this is an incorrect partition.
# org.apache.hudi.table.action.compact.strategy.DayBasedCompactionStrategy
#
org.apache.hudi.table.action.compact.strategy.BoundedPartitionAwareCompactionStrategy
#
org.apache.hudi.table.action.compact.strategy.UnBoundedPartitionAwareCompactionStrategy
!image-2023-11-08-16-02-39-291.png!
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-7051
- Type: Bug
- Attachment(s):
- 08/Nov/23
08:01;vmaster;image-2023-11-08-16-01-46-166.png;https://issues.apache.org/jira/secure/attachment/13064245/image-2023-11-08-16-01-46-166.png
- 08/Nov/23
08:02;vmaster;image-2023-11-08-16-02-39-291.png;https://issues.apache.org/jira/secure/attachment/13064244/image-2023-11-08-16-02-39-291.png
---
## Comments
06/Dec/23 03:39;shivnarayan;hey [~vmaster] :
sorry I am bit confused.
as per master, filterPartitionPaths in DayBasedCompactionStrategy is as
below
{code:java}
@Override
public List<String> filterPartitionPaths(HoodieWriteConfig writeConfig,
List<String> allPartitionPaths) {
return allPartitionPaths.stream().sorted(comparator)
.collect(Collectors.toList()).subList(0,
Math.min(allPartitionPaths.size(),
writeConfig.getTargetPartitionsPerDayBasedCompaction()));
} {code}
Only in
BoundedPartitionAwareCompactionStrategy.filterPartitionPaths I see the
replace operations.
But can you help me understand whats the issue in there. I understand
"dllr_date=2023/10/10" may not be an actual partition present physcially, but
thats interim state used for comparison and later we switch it back.
in other words.
if original partition is hypehnated.
dllr_date=2023-10-10 -> gets converted to "dllr_date=2023/10/10", and then
comparisons are performed to sort them and then converted back to
dllr_date=2023-10-10. So, not sure where is the bug here. can you throw some
light please
;;;
---
02/Jan/24 01:53;vmaster;[~shivnarayan] thanks for your reply, as you say,
the class of DayBasedCompactionStrategy has been fixed by issue HUDI-6975, but
problem still exists in follow class:
#
org.apache.hudi.table.action.compact.strategy.BoundedPartitionAwareCompactionStrategy
#
org.apache.hudi.table.action.compact.strategy.UnBoundedPartitionAwareCompactionStrategy
if we have a hive partition like this: dllr_date=2023-10-10 ,the replace
operation will result in an error, final get 'dllr_date=2023/10/10'
{code:java}
List<String> allPartitionPaths =
partitionPaths.stream().map(partition -> partition.replace("/",
"-")).sorted(Comparator.reverseOrder())
.map(partitionPath -> partitionPath.replace("-",
"/")).collect(Collectors.toList()); {code}
in other words, there has second replace operation, but the first will has
no effect, only the second replace works, as this point, there is no converted
back.
;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]