If you are running query as in DRILL-1162, one of the join columns is defined as binary in parquet file:
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment On Fri, Sep 25, 2015 at 9:24 AM, Chris Westin <[email protected]> wrote: > That's interesting, because it is the reallocation of a VarCharVector that > is the one that is failing. I'll look at the file history, but do you > recall offhand what changed there? > > On Fri, Sep 25, 2015 at 9:20 AM, Aman Sinha <[email protected]> wrote: > > > Right, the HashJoin and HashTable code hasn't changed significantly in > > terms of memory allocation in the last several releases. You might want > to > > look at the change history for underlying vector allocations...I recall > > that variable length vector allocations went through some changes. > However > > DRILL-1162 does not seem to be using varchar columns (I think..). > > > > On Fri, Sep 25, 2015 at 6:51 AM, Jacques Nadeau <[email protected]> > > wrote: > > > > > I don't think anyone has done much there in quite some time. I'd guess > > > something external has changed that affects it. The last substantive > > change > > > around that code (I think) was the introduction of the multiplexing > work > > > that Venki and Yuliya did early this year. > > > On Sep 25, 2015 6:32 AM, "Chris Westin" <[email protected]> > wrote: > > > > > > > I've been looking into DRILL-1162, and found that a query that used > to > > > run > > > > within certain constraints (DRILL_MAX_DIRECT_MEMORY=32G) no longer > does > > > > even though it looks like there should be plenty of memory. I took > the > > > > query in that report, and removed the last ten (redundant) join > > elements, > > > > and it now fails with 32G direct memory, even though it previously > ran > > > > (although it produced the wrong results). When I check the query > > profile, > > > > it only consumed around ~9G -- so there should be plenty of space > left > > > > before it fails. I started looking at it in the debugger, and the > > > > allocation failure occurs during an attempt to resize the output > > vector. > > > > The allocator being used believes there's no memory left, even though > > > it's > > > > parent has more than enough to satisfy the request. > > > > > > > > I've also found another ticket with a HashJoin that fails in a > similar > > > way > > > > even though there is plenty of memory. > > > > > > > > Hash the execution of HashJoin or its use of its result vector > changed > > in > > > > some way recently? > > > > > > > > > >
