[GitHub] drill pull request #795: DRILL-5089: Get only partial schemas of relevant st...

paul-rogers Sat, 25 Mar 2017 22:46:50 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/795#discussion_r108051857
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
    @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
         }
       }
     
    +
    +  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
    +                                            final String storage) {
    +    final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
    +    final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
    +    Set<String> storageSet = Sets.newHashSet();
    +    storageSet.add(storage);
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  public SchemaPlus addPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
    +                                            Set<String> storages, 
SchemaPlus rootSchema) {
    +    final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  private void expandSecondLevelSchema(SchemaPlus parent) {
    --- End diff --
    
    Maybe explain this a bit? Why are we expanding second-level schemas for 
*all* top-level schemas? Can't we do the expansion on the fly as we resolve? 
That is, if a query has a path "a.b.c.d", can't we just resolve a, then within 
a, resolve b, and so on until we get to d? Else, we are still open to a 
performance hit if, say, a is a directory of a million files, or a database 
with 10K tables.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #795: DRILL-5089: Get only partial schemas of relevant st...

Reply via email to