[
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811982#comment-16811982
]
ASF GitHub Bot commented on DRILL-7063:
---------------------------------------
dvjyothsna commented on pull request #1723: DRILL-7063: Seperate metadata cache
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r272854637
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
##########
@@ -149,20 +157,25 @@ public static ParquetTableMetadata_v3
getParquetTableMetadata(Map<FileStatus, Fi
* Get the parquet metadata for the table by reading the metadata file
*
* @param fs current file system
- * @param path The path to the metadata file, located in the directory that
contains the parquet files
+ * @param paths The path to the metadata file, located in the directory that
contains the parquet files
* @param metaContext metadata context
* @param readerConfig parquet reader configuration
* @return parquet table metadata. Null if metadata cache is missing,
unsupported or corrupted
*/
public static @Nullable ParquetTableMetadataBase readBlockMeta(FileSystem fs,
- Path path,
+ List<Path>
paths,
MetadataContext metaContext,
ParquetReaderConfig readerConfig) {
- if (ignoreReadingMetadata(metaContext, path)) {
- return null;
- }
Metadata metadata = new Metadata(readerConfig);
- metadata.readBlockMeta(path, false, metaContext, fs);
+ if (paths.isEmpty()) {
+ metaContext.setMetadataCacheCorrupted(true);
+ }
+ for (Path path: paths) {
+ if (ignoreReadingMetadata(metaContext, path)) {
+ return null;
Review comment:
Lets take this scenario where the summary file is corrupted but the file
metadata file is intact. If we pull out the ignoreReadingMetadata (), we do
this check of ignoring only before reading the summary file. But after reading
the corrupted summary file, metadatacachecorrupted will be set to true. Reading
of filemetadata will not be skipped if we don't check metadatacachecorrupted
status. Regarding the performance if the Paths[1] is corrupt, we will get to
know that it is corrupt only in the readBlockMeta(). So even pulling out
ignoreReadingMetadata() doesn't help this case. And always Paths[0] has summary
and since summary is not very big it wouldn't impact the performance a lot.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Create separate summary file for schema, totalRowCount, totalNullCount
> (includes maintenance)
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Metadata
> Reporter: Venkata Jyothsna Donapati
> Assignee: Venkata Jyothsna Donapati
> Priority: Major
> Fix For: 1.16.0
>
> Original Estimate: 252h
> Remaining Estimate: 252h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)