vvysotskyi commented on a change in pull request #2026: DRILL-7330: Implement 
metadata usage for all format plugins
URL: https://github.com/apache/drill/pull/2026#discussion_r392681623
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##########
 @@ -634,6 +642,62 @@ public NonInterestingColumnsMetadata 
getNonInterestingColumnsMetadata() {
     return nonInterestingColumnsMetadata;
   }
 
+  /**
+   * Returns {@link TableMetadataProviderBuilder} instance based on specified
+   * {@link MetadataProviderManager} source.
+   *
+   * @param source metadata provider manager
+   * @return {@link TableMetadataProviderBuilder} instance
+   */
+  protected abstract TableMetadataProviderBuilder 
tableMetadataProviderBuilder(MetadataProviderManager source);
+
+  /**
+   * Returns {@link TableMetadataProviderBuilder} instance which may provide 
metadata
+   * without using Drill Metastore.
+   *
+   * @param source metadata provider manager
+   * @return {@link TableMetadataProviderBuilder} instance
+   */
+  protected abstract TableMetadataProviderBuilder 
defaultTableMetadataProviderBuilder(MetadataProviderManager source);
+
+  /**
+   * Compares the last modified time of files obtained from specified 
selection with
+   * the Metastore last modified time to determine whether Metastore metadata
+   * is not outdated. If metadata is outdated, {@link MetadataException} will 
be thrown.
+   *
+   * @param selection the source of files to check
+   * @throws MetadataException if metadata is outdated
+   */
+  protected void checkMetadataConsistency(FileSelection selection, 
Configuration fsConf) throws IOException {
+    if (metadataProvider.checkMetadataVersion()) {
+      DrillFileSystem fileSystem =
+          
ImpersonationUtil.createFileSystem(ImpersonationUtil.resolveUserName(getUserName()),
 fsConf);
+
+      List<FileStatus> fileStatuses = 
FileMetadataInfoCollector.getFileStatuses(selection, fileSystem);
+
+      long lastModifiedTime = 
metadataProvider.getTableMetadata().getLastModifiedTime();
+
+      Set<Path> removedFiles = new 
HashSet<>(metadataProvider.getFilesMetadataMap().keySet());
+      Set<Path> newFiles = new HashSet<>();
+
+      boolean isChanged = false;
+
+      for (FileStatus fileStatus : fileStatuses) {
+        if 
(!removedFiles.remove(Path.getPathWithoutSchemeAndAuthority(fileStatus.getPath())))
 {
+          newFiles.add(fileStatus.getPath());
+        }
+        if (fileStatus.getModificationTime() > lastModifiedTime) {
+          isChanged = true;
+          break;
+        }
+      }
 
 Review comment:
   I agree that it may be costly. But in the case when we wouldn't do this 
check now, we can obtain incorrect results. Regarding integrations with some 
external systems which may do this, it is a good idea, but I don't know about 
such systems. Currently, we either use actual metadata for queries or do not 
use it at all.
   
   Regarding making auto-refresh, there is another Jira 
https://issues.apache.org/jira/browse/DRILL-7430 which is a holder for further 
improvements for metastore, so we can discuss it there.
   
   Regarding the case where data continuously arriving, I don't think that this 
is the right case for using metastore, since refreshing the metadata, even with 
our incremental update is too costly to do it so often.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to