[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

ASF GitHub Bot (JIRA) Thu, 04 Jul 2019 15:19:56 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878899#comment-16878899
 ]


ASF GitHub Bot commented on DRILL-7306:
---------------------------------------

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, I don't think that is the correct set of files used by the test. 
If I manually count the matches for the "union03" query, I get three rows out 
of a total of 1500 rows in the customer table. The expected results shown in 
your earlier post show customer IDs beyond 1500, suggesting that the failed 
query ran against a larger file than the one in the directory you suggested.
   
   Unfortunately, I can't check the contents of the customer.parquet file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Would sure be handy.
   
   Looking closer, it seems that the files in the test framework are for scale 
factor (SF) 0.1. But, the tests use files for SF1. So, I suspect I'm testing 
against files 1/10 the size of those used in the tests that failed. I'm 
guessing the test framework generates the SF1 files during its setup phase 
(which seems to require MFS to run.)
   
   Further, I'm completely mystified at how my changes could impact Parquet 
since the only changed source files are for the "new" scan, which Parquet does 
not use. Oddly, none of the text file queries fail; which is the area I *did* 
change.
   
   So, net status is that I'm stuck: can't reproduce the issue, can't inspect 
the data files, can't get access to the SF1 files, can't run the functional 
tests.
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?)
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   I can't tell what the framework is doing. Does it try to do a CSV query 
against the "golden" file to compare results? Though, the error seems to say 
that the Parquet query returned zero rows rather than that the Parquet results 
didn't match the "golden" CSV expected results.
   
   Any suggestions for how to proceed?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Disable "fast schema" batch for new scan framework
> --------------------------------------------------
>
>                 Key: DRILL-7306
>                 URL: https://issues.apache.org/jira/browse/DRILL-7306
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

Reply via email to