[ 
https://issues.apache.org/jira/browse/DRILL-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563285#comment-17563285
 ] 

ASF GitHub Bot commented on DRILL-8182:
---------------------------------------

jnturton commented on code in PR #2583:
URL: https://github.com/apache/drill/pull/2583#discussion_r914952472


##########
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockTableDef.java:
##########
@@ -84,6 +85,19 @@ public String toString() {
     }
   }
 
+  /**
+   * An unfortunate hack that adds required DrillTableSelection behaviour to
+   * the entries list while keeping its serialised form a JSON array to remain
+   * compatible with existing, serialised logical plan JSON files as may be
+   * found in the Drill unit test code, for example.
+   */
+  public static class MockTableSelection extends ArrayList<MockScanEntry> 
implements DrillTableSelection {

Review Comment:
   @vvysotskyi this piece has turned out a bit ugly, although it is only in 
this mock plugin so not very important I suppose. Another possibilty here was 
to use a wrapper class, rather than extending ArrayList, in order to implement 
DrillTableSelection but then Jackson serialises the wrapper object and existing 
logical plan JSON test data is no longer readable. However I don't think 
there's a lot of such data (perhaps only two files) if it would preferable to 
modify it. Examples
   
   ```
   ➜  drill git:(8182-excel-data-mixing) ✗ grep -C3 mock **/*.json | grep -C3 
selection
   common/src/test/resources/storage_engine_plan.json-      "op" : "scan",
   common/src/test/resources/storage_engine_plan.json-      "@id" : 1,
   common/src/test/resources/storage_engine_plan.json:      "storageengine" : 
"mock-engine",
   common/src/test/resources/storage_engine_plan.json-      "selection" : null,
   common/src/test/resources/storage_engine_plan.json-      "ref" : null
   common/src/test/resources/storage_engine_plan.json-    }, {
   --
   --
   --
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json-
    "op" : "scan",
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json-
    "@id" : 1,
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json:
    "storageengine" : "mock",
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json-
    "selection" : [ {
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json-
      "records" : 10,
   
exec/java-exec/src/test/resources/functions/conv/conversionTestWithLogicalPlan.json-
      "types" : [ {
   --
   --
   exec/java-exec/src/test/resources/scan_screen_logical.json-    "op" : "scan",
   exec/java-exec/src/test/resources/scan_screen_logical.json-    "memo" : 
"initial_scan",
   exec/java-exec/src/test/resources/scan_screen_logical.json:    
"storageengine" : "mock",
   exec/java-exec/src/test/resources/scan_screen_logical.json-    "selection" : 
[ {
   exec/java-exec/src/test/resources/scan_screen_logical.json-      "records" : 
100,
   exec/java-exec/src/test/resources/scan_screen_logical.json-      "types" : [ 
{
   --
   ```





> File scan nodes not differentiated by format config
> ---------------------------------------------------
>
>                 Key: DRILL-8182
>                 URL: https://issues.apache.org/jira/browse/DRILL-8182
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Other
>    Affects Versions: 1.20.0
>            Reporter: James Turton
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.2
>
>         Attachments: Products_Customers_Orders.xlsx
>
>
> Two file scans that differ only by format config overriden with table 
> functions may be genuinely different in terms of the data they return. The 
> format config options may affect the behaviour of the format parser (date 
> strings, delimiters, etc.) possibly directing format plugin to entirely 
> different data within the file. Such scans should not be considered the same 
> by the query planner. This illustrated by the following example based on the 
> Excel format plugin.
> When a query includes multiple SELECTs against a workbook by using TABLE 
> functions to access different sheets, and those sheets contain a column with 
> the same name, then values for that column come a single sheet for both 
> SELECTs.  To reproduce, run the following query against the attachment and 
> note that the `Name` values returned from the Products sheet are `Name` 
> values from the Customers sheet.
>  
> {code:java}
> with
> prod as (
>     select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
> (type => 'excel', sheetName => 'Products'))
> )
> , cust as (
>     select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
> (type => 'excel', sheetName => 'Customers'))
> )
> select * from cust join prod on cust.Id = prod.Id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to