[ https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106216#comment-15106216 ]
jean-claude commented on DRILL-4278: ------------------------------------ Thanks Jaques however I'm afraid this does not eliminate the issue. I have done the pull $ git pull https://github.com/jacques-n/drill DRILL-4278 Then mvn clean install -DskipTests I then re-ran my tests using the tarball built in distribution/target/ The problem still remains. Note I don't seem a memory leak if I remove the LIMIT from my query or if I make the LIMIT larger then the entire data set. The fix you made I'm sure is valid however I don't think it would be related to the LIMIT clause correct? I've tried many variant, different file format, different number of files, in hdfs and not. The only thing that seems to affect the this leak is the use of the LIMIT clause. Another observation that might be of interest the leak is more pronounced when that records (rows) are of a substantial size. See my JSON example above I have a rather large string. If the row is rather small than it leaks very slowly. > Memory leak when using LIMIT > ---------------------------- > > Key: DRILL-4278 > URL: https://issues.apache.org/jira/browse/DRILL-4278 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC > Affects Versions: 1.4.0, 1.5.0 > Environment: OS X > 0: jdbc:drill:zk=local> select * from sys.version; > +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+----------------------------+----------------------------+ > | version | commit_id | > commit_message | commit_time | > build_email | build_time | > +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+----------------------------+----------------------------+ > | 1.4.0 | 32b871b24c7b69f59a1d2e70f444eed6e599e825 | > [maven-release-plugin] prepare release drill-1.4.0 | 08.12.2015 @ 00:24:59 > PST | venki.koruka...@gmail.com | 08.12.2015 @ 01:14:39 PST | > +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+----------------------------+----------------------------+ > 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT'; > +-----------------------------+-------+---------+----------+----------+-------------+-----------+------------+ > | name | kind | type | status | num_val | > string_val | bool_val | float_val | > +-----------------------------+-------+---------+----------+----------+-------------+-----------+------------+ > | planner.slice_target | LONG | SYSTEM | CHANGED | 10 | null > | null | null | > | planner.width.max_per_node | LONG | SYSTEM | CHANGED | 5 | null > | null | null | > +-----------------------------+-------+---------+----------+----------+-------------+-----------+------------+ > 2 rows selected (0.16 seconds) > Reporter: jean-claude > > copy the parquet files in the samples directory so that you have a 12 or so > $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/ > nationsMF1.parquet > nationsMF2.parquet > nationsMF3.parquet > create a file with a few thousand lines like these > select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` > limit 500; > start drill > $ /apache-drill-1.4.0/bin/drill-embeded > reduce the slice target size to force drill to use multiple fragment/threads > jdbc:drill:zk=local> system set planner.slice_target=10; > now run the list of queries from the file your created above > jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql > the java heap space keeps going up until the old space is at 100% and > eventually you get an OutOfMemoryException in drill > $ jstat -gccause 86850 5s > S0 S1 E O M CCS YGC YGCT FGC FGCT > GCT LGCC GCC > 0.00 0.00 100.00 100.00 98.56 96.71 2279 26.682 240 458.139 > 484.821 GCLocker Initiated GC Ergonomics > 0.00 0.00 100.00 99.99 98.56 96.71 2279 26.682 242 461.347 > 488.028 Allocation Failure Ergonomics > 0.00 0.00 100.00 99.99 98.56 96.71 2279 26.682 245 466.630 > 493.311 Allocation Failure Ergonomics > 0.00 0.00 100.00 99.99 98.56 96.71 2279 26.682 247 470.020 > 496.702 Allocation Failure Ergonomics > If you do the same test but do not use the LIMIT then the memory usage does > not go up. > If you add a where clause so that no results are returned, then the memory > usage does not go up. > Something with the RPC layer? > Also it seems sensitive to the number of fragments/threads. If you limit it > to one fragment/thread the memory usage goes up much slower. > I have used parquet files and CSV files. In either case the behaviour is the > same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)