[ 
https://issues.apache.org/jira/browse/HUDI-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408885#comment-17408885
 ] 

sivabalan narayanan commented on HUDI-2005:
-------------------------------------------

1. ListingBasedRollbackHelper.

 
{code:java}
// collect all log files that is supposed to be deleted with this rollback
Map<FileStatus, Long> writtenLogFileSizeMap = 
FSUtils.getAllLogFiles(metaClient.getFs(),
    FSUtils.getPartitionPath(config.getBasePath(), 
rollbackRequest.getPartitionPath()),
    fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), latestBaseInstant)
    .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> 
value.getFileStatus().getLen()));{code}
 

 
{code:java}
// This step is intentionally done after writer is closed. Guarantees that
// getFileStatus would reflect correct stats and FileNotFoundException is not 
thrown in
// cloud-storage : HUDI-168
Map<FileStatus, Long> filesToNumBlocksRollback = Collections.singletonMap(
    
metaClient.getFs().getFileStatus(Objects.requireNonNull(writer).getLogFile().getPath()),
    1L
);
{code}
 

 

2. SparkMarkerBasedRollbackStrategy

 
{code:java}
protected Map<FileStatus, Long> getWrittenLogFileSizeMap(String 
partitionPathStr, String baseCommitTime, String fileId) throws IOException {
  // collect all log files that is supposed to be deleted with this rollback
  return FSUtils.getAllLogFiles(table.getMetaClient().getFs(),
      FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, 
HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
      .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> 
value.getFileStatus().getLen()));
}
{code}
 

 

3. HoodieLogFileReader fetches log file position using 

 
{code:java}
if (this.reverseReader) {
  this.reverseLogFilePosition = this.lastReverseLogFilePosition = 
fs.getFileStatus(logFile.getPath()).getLen();
}
{code}
As of now, HoodieLogFileReader only has FileSystem to access. Not sure if we 
can leak Metadata to this layer.  

 

 

> Audit and remove references of fs.listStatus() and fs.getFileStatus() or 
> fs.exists()
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-2005
>                 URL: https://issues.apache.org/jira/browse/HUDI-2005
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Nishith Agarwal
>            Assignee: sivabalan narayanan
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to