are you running the tests on 32 core machines ? a different number of cores affects how much memory is available for the sort
On Wed, Dec 30, 2015 at 1:02 PM, Abdel Hakim Deneche <[email protected]> wrote: > The following tests are failing: > > >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q163_DRILL-2046.q >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q177_DRILL-2046.q >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q174.q >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/ >> /Functional/window_functions/multiple_partitions/q35.sql >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q160_DRILL-1985.q >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q162_DRILL-1985.q >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q165.q >> /Functional/window_functions/multiple_partitions/q37.sql >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q171.q >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q168_DRILL-2046.q >> /Functional/window_functions/multiple_partitions/q36.sql >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/q159_DRILL-2046.q >> /Functional/window_functions/multiple_partitions/q30.sql >> >> /Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q157_DRILL-1985.q >> /Functional/window_functions/multiple_partitions/q22.sql > > > With one of the following errors: > > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory >> while executing the query. >> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: >> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate >> sv2, and not enough batchGroups to spill >> at >> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:356) >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] > > > or > > java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Failed to >> pre-allocate memory for SV. Existing recordCount*4 = 0, incoming batch >> recordCount*4 = 3340 >> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: >> Failed to pre-allocate memory for SV. Existing recordCount*4 = 0, incoming >> batch recordCount*4 = 3340 >> at >> org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.add(SortRecordBatchBuilder.java:116) >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] >> at >> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:451) >> ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] > > > > > On Wed, Dec 30, 2015 at 12:42 PM, Jacques Nadeau <[email protected]> > wrote: > >> I'll let Steven answer your question directly. >> >> FYI, we are running a regression suite that was forked from the MapR repo >> a >> month or so ago because we had to fix a bunch of things to make it work >> with Apache Hadoop. (There was a thread about this back then and we >> haven't >> yet figured out how to merge both suites.) It is possible that he had a >> successful run but the failures are happening on items that you've >> recently >> added to your suite. >> >> It is also possible (likely?) that the configuration settings for our >> regression clusters are not the same. >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Wed, Dec 30, 2015 at 12:37 PM, Abdel Hakim Deneche < >> [email protected] >> > wrote: >> >> > Steven, >> > >> > were you able to successfully run the regression tests on the transfer >> > patch ? I just tried and saw several queries running out of memory ! >> > >> > On Wed, Dec 30, 2015 at 11:46 AM, Abdel Hakim Deneche < >> > [email protected] >> > > wrote: >> > >> > > Created DRILL-4236 <https://issues.apache.org/jira/browse/DRILL-4236> >> to >> > > keep track of this improvement. >> > > >> > > On Wed, Dec 30, 2015 at 11:01 AM, Jacques Nadeau <[email protected]> >> > > wrote: >> > > >> > >> Since the accounting changed (more accurate), the termination >> condition >> > >> for >> > >> the sort operator will be different than before. In fact, this likely >> > will >> > >> be sooner since our accounting is much larger than previously (since >> we >> > >> correctly consider the entire allocation rather than simply the used >> > >> allocation). >> > >> >> > >> Hakim, >> > >> Steven and I were discussing the need to update the ExternalSort >> > operator >> > >> to use the new allocator functionality to better manage its memory >> > >> envelope. Would you be interested in working on this since you seem >> to >> > be >> > >> working with that code the most? Basically, it used to be that there >> was >> > >> no >> > >> way the sort operator would be able to correctly detect a memory >> > condition >> > >> and so it jumped through a bunch of hoops to try to figure out the >> > >> termination condition.With the transfer accounting in place, this >> code >> > can >> > >> be greatly simplified to just use the current operator memory >> > allocation. >> > >> >> > >> -- >> > >> Jacques Nadeau >> > >> CTO and Co-Founder, Dremio >> > >> >> > >> On Wed, Dec 30, 2015 at 10:48 AM, rahul challapalli < >> > >> [email protected]> wrote: >> > >> >> > >> > I installed the latest master and ran this query. So >> > >> > planner.memory.max_query_memory_per_node should have been the >> default >> > >> > value. I switched back to 1.4.0 branch and this query completed >> > >> > successfully. >> > >> > >> > >> > On Wed, Dec 30, 2015 at 10:37 AM, Abdel Hakim Deneche < >> > >> > [email protected] >> > >> > > wrote: >> > >> > >> > >> > > Rahul, >> > >> > > >> > >> > > How much memory was assigned to the sort operator ( >> > >> > > planner.memory.max_query_memory_per_node) ? >> > >> > > >> > >> > > On Wed, Dec 30, 2015 at 9:54 AM, rahul challapalli < >> > >> > > [email protected]> wrote: >> > >> > > >> > >> > > > I am seeing an OOM error while executing a simple CTAS query. I >> > >> raised >> > >> > > > DRILL-4324 for this. The query mentioned in the JIRA used to >> > >> complete >> > >> > > > successfully without any issue prior to 1.5. Any idea what >> could >> > >> have >> > >> > > > caused the regression? >> > >> > > > >> > >> > > > - Rahul >> > >> > > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > -- >> > >> > > >> > >> > > Abdelhakim Deneche >> > >> > > >> > >> > > Software Engineer >> > >> > > >> > >> > > <http://www.mapr.com/> >> > >> > > >> > >> > > >> > >> > > Now Available - Free Hadoop On-Demand Training >> > >> > > < >> > >> > > >> > >> > >> > >> >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > > >> > > -- >> > > >> > > Abdelhakim Deneche >> > > >> > > Software Engineer >> > > >> > > <http://www.mapr.com/> >> > > >> > > >> > > Now Available - Free Hadoop On-Demand Training >> > > < >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > > >> > > >> > >> > >> > >> > -- >> > >> > Abdelhakim Deneche >> > >> > Software Engineer >> > >> > <http://www.mapr.com/> >> > >> > >> > Now Available - Free Hadoop On-Demand Training >> > < >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > > >> > >> > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available> > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
