paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, I don't think that is the correct set of files used by the test. 
If I manually count the matches for the "union03" query, I get three rows out 
of a total of 1500 rows in the customer table. The expected results shown in 
your earlier post show customer IDs beyond 1500, suggesting that the failed 
query ran against a larger file than the one in the directory you suggested.
   
   Unfortunately, I can't check the contents of the customer.parquet file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Would sure be handy.
   
   Also, I'm completely mystified at how my changes could impact Parquet since 
the only changed source files are for the "new" scan, which Parquet does not 
use.
   
   So, net status is that I'm stuck: can't reproduce the issue, can't inspect 
the data files, can't run the functional tests.
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?)
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   I can't tell what the framework is doing. Does it try to do a CSV query 
against the "golden" file to compare results? Though, the error seems to say 
that the Parquet query returned zero rows rather than that the Parquet results 
didn't match the "golden" CSV expected results.
   
   Any suggestions for how to proceed?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to