[
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811635#comment-16811635
]
ASF GitHub Bot commented on DRILL-7063:
---------------------------------------
amansinha100 commented on pull request #1723: DRILL-7063: Seperate metadata
cache file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r272801314
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
##########
@@ -149,20 +157,25 @@ public static ParquetTableMetadata_v3
getParquetTableMetadata(Map<FileStatus, Fi
* Get the parquet metadata for the table by reading the metadata file
*
* @param fs current file system
- * @param path The path to the metadata file, located in the directory that
contains the parquet files
+ * @param paths The path to the metadata file, located in the directory that
contains the parquet files
* @param metaContext metadata context
* @param readerConfig parquet reader configuration
* @return parquet table metadata. Null if metadata cache is missing,
unsupported or corrupted
*/
public static @Nullable ParquetTableMetadataBase readBlockMeta(FileSystem fs,
- Path path,
+ List<Path>
paths,
MetadataContext metaContext,
ParquetReaderConfig readerConfig) {
- if (ignoreReadingMetadata(metaContext, path)) {
- return null;
- }
Metadata metadata = new Metadata(readerConfig);
- metadata.readBlockMeta(path, false, metaContext, fs);
+ if (paths.isEmpty()) {
+ metaContext.setMetadataCacheCorrupted(true);
+ }
+ for (Path path: paths) {
+ if (ignoreReadingMetadata(metaContext, path)) {
+ return null;
Review comment:
I see that `ignoreReadingMetadata()` method does not actually examine the
`path`. It just checks if the metadata cache is corrupted. From a logical
standpoint, I don't see why one would ignore the Summary file but not the File
metadata (or vice-versa). From a performance reason also, suppose paths[0]
was not ignored (hypothetically), and we execute the next statement
`readBlockMeta()` and read the entire File metadata and subsequently find out
that paths[1] is ignored, it would be wasted effort. What do you think ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Create separate summary file for schema, totalRowCount, totalNullCount
> (includes maintenance)
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Metadata
> Reporter: Venkata Jyothsna Donapati
> Assignee: Venkata Jyothsna Donapati
> Priority: Major
> Fix For: 1.16.0
>
> Original Estimate: 252h
> Remaining Estimate: 252h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)