yihua commented on a change in pull request #4743:
URL: https://github.com/apache/hudi/pull/4743#discussion_r808539950
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -147,18 +148,21 @@ public WriteOperationType getOperationType() {
* been touched multiple times in the given commits, the return value will
keep the one
* from the latest commit.
*
+ *
+ * @param hadoopConf
* @param basePath The base path
* @return the file full path to file status mapping
*/
- public Map<String, FileStatus> getFullPathToFileStatus(String basePath) {
+ public Map<String, FileStatus> getFullPathToFileStatus(Configuration
hadoopConf, String basePath) {
Map<String, FileStatus> fullPathToFileStatus = new HashMap<>();
for (List<HoodieWriteStat> stats : getPartitionToWriteStats().values()) {
// Iterate through all the written files.
for (HoodieWriteStat stat : stats) {
String relativeFilePath = stat.getPath();
Path fullPath = relativeFilePath != null ?
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;
if (fullPath != null) {
- FileStatus fileStatus = new FileStatus(stat.getFileSizeInBytes(),
false, 0, 0,
+ long blockSize = FSUtils.getFs(fullPath.toString(),
hadoopConf).getDefaultBlockSize(fullPath);
Review comment:
@alexeykudinkin I think I confused it with `getDefaultBlockSize()` call
which is based on file system (see below), not file status, and it only fetches
from the config. This is fine.
Even if the block size is 0, should Hive still honor the actual block size
of the file? At least that's my understanding for Trino Hive connector.
```
/**
* Return the number of bytes that large input files should be optimally
* be split into to minimize i/o time.
* @deprecated use {@link #getDefaultBlockSize(Path)} instead
*/
@Deprecated
public long getDefaultBlockSize() {
// default to 32MB: large enough to minimize the impact of seeks
return getConf().getLong("fs.local.block.size", 32 * 1024 * 1024);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]