Re: [I] the mergejoin memery leak (drill)

via GitHub Mon, 22 Jan 2024 21:06:53 -0800


shfshihuafeng commented on issue #2871:
URL: https://github.com/apache/drill/issues/2871#issuecomment-1905300921


   > @weijunlu, you may have found a bug. The behavior you described is not 
expected.
   > 
   > Just to verify I understand: you ran your query once. Did the query run 
successfully the first time? Or, did your first query fail? If the query 
_worked_ the first time, we have one situation. If the query _failed_ the first 
time, we have another situation.
   > 
   > Then, you ran the _same_ query a second time? This time you got an OOM 
error?
   > 
   > I think we may have up to three distinct issues.
   > 
   > First, I missed one important piece of information when I first read this 
report. You reported an exception: `Memory was leaked by query. Memory leaked: 
(249856)`. This messages indicates an actual bug. As I explained, Drill won't 
release the memory back to the OS. But, the query should have released the 
memory back to Netty. The error message says that it did not. This would be a 
bug. Even if the query fails, Drill is required to return memory to Netty. Such 
errors are, unfortunately, hard to track down. Our unit tests have many such 
checks, but your query appears to have uncovered a case that the unit tests do 
not check.
   > 
   > The stack trace suggests that the memory leak was detected as Drill was 
trying to shut down a failed query. So, I suspect your query run may not have 
actually worked. If it did work, you should see in the UI a very detailed 
report on all the operators in the query. Did you get that detailed report?
   > 
   > Second, I wonder if Drill has sufficient memory for the query you are 
trying to run. The TPCH queries tend to be memory-hungry. This query has six 
concurrent sorts. Then, it has five merge joins, plus more sorts. While the 
query runs, all the needed data will be in memory. (It is in memory because 
Drill uses memory to buffer data to keep things fast.)
   > 
   > I do not recall the size of a SF1 TPCH data set. What is the total size of 
the files in your data set?
   > 
   > Back when we used to run TPCH tests regularly, we would use a cluster of 5 
or 10 Drillbits, each with a generous amount of memory, so that Drill can hold 
the data in memory. I suspect that 2 GB of direct memory, on a single Drillbit, 
is probably not enough for this kind of query.
   > 
   > So, I suspect you need more direct memory. The UI says that you are using 
only 34% of a 4GB heap, but ~100% of 2GB direct memory. Try changing this 
allocation. Try allocating 3GB to heap. This will raise heap usage to 45%. 
Then, give as much memory as your machine has to direct memory. If you are 
running SqlLine on the same machine, please run it on another machine instead. 
If you have other services on your machine (a database, an IDE, etc.), I would 
recommend moving Drill to a dedicated machine. You may even consider using a 
cluster of machines to provide sufficient resources.
   > 
   > Once you get a query that succeeds, you can look at the detailed query 
profile to find out how much memory it required. Then, you can reduce the 
direct memory given to Drill to a lower amount, if the query allows.
   > 
   > Once you find a memory level that allows the query to run, you will 
probably avoid the memory leak error. The query will succeed and memory should 
be freed back to Netty as normal. TPCH SF1 queries used to pass. They should 
still work, unless something changed in the last couple of years in one of the 
operators.
   > 
   > Just to be clear, Drill works well on a laptop with the standard amount of 
memory _if you run simple queries on small datasets_. But, if you do classic 
TPCH "big data" queries, you need a cluster with enough capacity — that's why 
we use Hadoop.
   > 
   > The third issue is a minor point: the UI screenshot is a bit odd: it says 
you are at 94% of direct memory. I suspect you are actually at 100%, and the UI 
is dividing by the wrong number. (1000 vs. 1024 or some such).
   
   i think it  leak ponit , when sql stop ,tmp memory should be clearup .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] the mergejoin memery leak (drill)

Reply via email to