paul-rogers edited a comment on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884 Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As it turns out, I don't think that is the correct set of files used by the test. If I manually count the matches for the "union03" query, I get three rows out of a total of 1500 rows in the customer table. The expected results shown in your earlier post show customer IDs beyond 1500, suggesting that the failed query ran against a larger file than the one in the directory you suggested. Unfortunately, I can't check the contents of the customer.parquet file because I can't get Parquet tools to work after several hours of fighting one thing after another. I seem to recall we discussed bundling that tool with Drill. Would sure be handy. Looking closer, it seems that the files in the test framework are for scale factor (SF) 0.1. But, the tests use files for SF1. So, I suspect I'm testing against files 1/10 the size of those used in the tests that failed. I'm guessing the test framework generates the SF1 files during its setup phase (which seems to require MFS to run.) Further, I'm completely mystified at how my changes could impact Parquet since the only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the area I *did* change. So, net status is that I'm stuck: can't reproduce the issue, can't inspect the data files, can't get access to the SF1 files, can't run the functional tests. Just to make sure I'm tracking down the correct issue: does the master branch pass these same tests? Using the same data files (that is, using the same cluster without rebuilding the functional tests?) Were the parquet files used in the tests rebuilt recently? Might there be a problem with the data itself? I can't tell what the framework is doing. Does it try to do a CSV query against the "golden" file to compare results? Though, the error seems to say that the Parquet query returned zero rows rather than that the Parquet results didn't match the "golden" CSV expected results. Any suggestions for how to proceed?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
