[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

ASF GitHub Bot (JIRA) Sat, 06 Jul 2019 19:24:25 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879786#comment-16879786
 ]


ASF GitHub Bot commented on DRILL-7306:
---------------------------------------

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508966464
 
 
   @arina-ielchiieva, very sorry for the trouble that PR has caused. Thank you 
for the SF1 data. It allowed me to find the offending change and fix the issue.
   
   This is a case of "no good deed goes unpunished." I tried to work around the 
bad code explained in DRILL-7308 by setting precision only if non-zero. For 
reasons I did not track down, this change caused some part of the TPC-H queries 
to fail.
   
   The failure was in the "tuple/column metadata" classes. I had not realized 
that these classes are now used outside of just the new scan and schema work. 
(I did not track down the usage.)
   
   This exercise shows that Drill requires the following rules regarding 
precision:
   
   * Precision must be set for all types that need it (including VarChar and 
Decimal.)
   * Precision must be set for these types even if the value is zero.
   * Code that wants to know if the precision is non-zero should check the 
precision value itself. Code should not use an "is set" check as a substitute 
for "value != 0". (This is the flaw in the REST code described in DRILL-7308.)
   
   Filed DRILL-7318 to suggest we clean up and standardize our type-to-string 
implementations so that they build type strings based on the above rules.
   
   Reverted the precision-related change in {{PrimitiveColumnMetadata}}. The 
TPC-H union03 query now passes locally. I presume the others will also.
   
   The reversion caused a test to fail. It seems that the `EXPLAIN PLAN` for 
the new schema provisioning stuff uses the `toString()` method to get the type 
string shown in the plan for a provided schema. Not sure this is the best idea 
because `toString()` is for debugging and uses internal type names. It also 
shows the precision and scale for all types, resulting in the output `INT(0, 
0):OPTIONAL`. (The type names and cardinality notation should be fixed as part 
of DRILL-7318.)
   
   The reverted change had removed the unwanted precision and scale. By 
reverting the change, the unwanted precision and scale returned. So, for now, I 
modified `MaterializedField.toString()` to not emit the precision if the value 
is zero (unless the type is Decimal.) `EXPLAIN PLAN` now produces 
`INT:OPTIONAL` (without the precision and scale.)
   
   Left the changes as a new commit so you can review just the new changes.
   
   Please try this PR again against the functional tests. Let's hope this time 
things work this time. If so, I'll squash commits. We talked about creating a 
merge branch with the three open PRs. Since you need to run the functional 
tests again to verify the fix, do you want to then just commit this PR? I can 
create a merge branch for the other two. Otherwise, I'm happy to create a merge 
branch with all three.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --------------------------------------------------
>
>                 Key: DRILL-7306
>                 URL: https://issues.apache.org/jira/browse/DRILL-7306
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

Reply via email to