[
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879786#comment-16879786
]
ASF GitHub Bot commented on DRILL-7306:
---------------------------------------
paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508966464
@arina-ielchiieva, very sorry for the trouble that PR has caused. Thank you
for the SF1 data. It allowed me to find the offending change and fix the issue.
This is a case of "no good deed goes unpunished." I tried to work around the
bad code explained in DRILL-7308 by setting precision only if non-zero. For
reasons I did not track down, this change caused some part of the TPC-H queries
to fail.
The failure was in the "tuple/column metadata" classes. I had not realized
that these classes are now used outside of just the new scan and schema work.
(I did not track down the usage.)
This exercise shows that Drill requires the following rules regarding
precision:
* Precision must be set for all types that need it (including VarChar and
Decimal.)
* Precision must be set for these types even if the value is zero.
* Code that wants to know if the precision is non-zero should check the
precision value itself. Code should not use an "is set" check as a substitute
for "value != 0". (This is the flaw in the REST code described in DRILL-7308.)
Filed DRILL-7318 to suggest we clean up and standardize our type-to-string
implementations so that they build type strings based on the above rules.
Reverted the precision-related change in {{PrimitiveColumnMetadata}}. The
TPC-H union03 query now passes locally. I presume the others will also.
The reversion caused a test to fail. It seems that the `EXPLAIN PLAN` for
the new schema provisioning stuff uses the `toString()` method to get the type
string shown in the plan for a provided schema. Not sure this is the best idea
because `toString()` is for debugging and uses internal type names. It also
shows the precision and scale for all types, resulting in the output `INT(0,
0):OPTIONAL`. (The type names and cardinality notation should be fixed as part
of DRILL-7318.)
The reverted change had removed the unwanted precision and scale. By
reverting the change, the unwanted precision and scale returned. So, for now, I
modified `MaterializedField.toString()` to not emit the precision if the value
is zero (unless the type is Decimal.) `EXPLAIN PLAN` now produces
`INT:OPTIONAL` (without the precision and scale.)
Left the changes as a new commit so you can review just the new changes.
Please try this PR again against the functional tests. Let's hope this time
things work this time. If so, I'll squash commits. We talked about creating a
merge branch with the three open PRs. Since you need to run the functional
tests again to verify the fix, do you want to then just commit this PR? I can
create a merge branch for the other two. Otherwise, I'm happy to create a merge
branch with all three.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Disable "fast schema" batch for new scan framework
> --------------------------------------------------
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.16.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.17.0
>
>
> The EVF framework is set up to return a "fast schema" empty batch with only
> schema as its first batch because, when the code was written, it seemed
> that's how we wanted operators to work. However, DRILL-7305 notes that many
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast
> schema" batch, this ticket asks to disable the feature in the new scan
> framework. The feature is disabled with a config option; it can be re-enabled
> if ever it is needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)