[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

ASF GitHub Bot (JIRA) Thu, 04 Jul 2019 17:34:44 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878917#comment-16878917
 ]


ASF GitHub Bot commented on DRILL-7306:
---------------------------------------

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, these failures are quite a mystery.
   
   First, I don't think the files you mentioned are those used by the tests 
that failed. The set stored on GitHub is for scale factor (SF) 0.1 which has 
1500 customers in the customer table with ids from 0 to 1499. The tests seem to 
use SF1 which, perhaps, is generated by the test framework during its setup. If 
we look at the union03 query, the expected results include customer IDs in the 
six-digit range.
   
   That said, I did recreate the union03 query locally, using the SF0.1 files 
and got 3 result rows. To verify, I wrote a test that scanned the entire table 
(just a `SELECT * FROM ...`), and "manually" applied the where clause. Three 
rows matched. So, looks like, at least locally, that particular query works OK 
against the SF0.1 data set.
   
   Unfortunately, I can't check the contents of the `customer.parquet` file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Doing so would be very handy. Building by hand requires far more steps 
than is documented in the Parquet and HortonWorks web site: 1) install gcc, 2) 
download and compile thrift, 3) build Parquet-tools, 4) figure out the set of 
dependent jars that must be on the class path, 5)... not sure, here is where I 
gave up in frustration...
   
   Taking a step back, I'm actually completely mystified at how my changes 
could impact Parquet (only). This PR only changed source files are for the 
"new" scan, which Parquet does not use. Oddly, none of the text file queries 
fail; which is the one area I *did* change.
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?) Perhaps try testing the 
log regex or mock PRs. They are rebased on the same master version as this PR. 
But, they include a distinct set of changes. If those PRs pass, then the 
problem is somewhere in this PR. If those {Rs have failures, then perhaps we 
want to double-check the test framework data.
   
   While that is done, I will continue to try to find a way to track down the 
issue (without access to the test framework or the SF1 data...)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Disable "fast schema" batch for new scan framework
> --------------------------------------------------
>
>                 Key: DRILL-7306
>                 URL: https://issues.apache.org/jira/browse/DRILL-7306
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

Reply via email to