Riddle me this:

https://gist.github.com/hagmonk/a75621b143501966c22f53ed1e2bc36e 
<https://gist.github.com/hagmonk/a75621b143501966c22f53ed1e2bc36e>

Wherein I synthesize a large table in Postgres, then attempt to lazily load the 
table, discarding each row as I receive it. I tried *many* permutations and 
experiments, but settled on these two tests to illustrate my point. Which is 
that I simply can't get it to work with clojure.java.jdbc.

test1, according to all my research and reading of the source code involved, 
should consume the query results lazily. It does not, and I can't for the life 
of me figure out why. Traffic starts to stream in, and the heap is overwhelmed 
almost immediately. I've deliberately set the heap to 1 GB.

test2 uses a technique I borrowed wholesale from Ghadi Shayban in JDBC-99 
<https://dev.clojure.org/jira/browse/JDBC-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#issue-tabs>,
 which is to have ResultSet implement IReduceInit. It consumes a nominal amount 
of memory. I've verified it's actually doing something by putting counters in, 
and using YourKit to watch about 20 MB/s of traffic streaming into the JVM. 
It's brilliant, it doesn't even break 200 MB total heap usage.

I used YourKit to track where the memory is being retained for test1. Initially 
I made the mistake of not setting the fetchSize, so I saw an ArrayList inside 
the driver holding the reference. The driver documentation 
<https://jdbc.postgresql.org/documentation/head/query.html> confirms that 
autoCommit must be disabled and the fetchSize set to some non-zero number.

After making that change, YourKit confirmed that the GC root holding all the 
memory was the stack local variable "rs". At least I think it did, as a 
non-expert in this domain. I tried disassembling the functions using 
no.disassemble and the IntelliJ decompiler but I'm not really at the point 
where I understand what to look for.

So my questions are:

1) what am I doing wrong with clojure.java.jdbc?

Note some things I've already tried:

* using row-fn instead of result-set-fn
* using prepared statements
* explicitly setting auto-commit false on the connection
* declaring my result-set-fn with (^{:once true} *fn […]) (I did not see a 
change in the disassembly when using this)
* probably other things I am forgetting

2) in these situations where you suspect that the head of a lazy sequence is 
being retained, how do you reason about it? I'm kind of lucky this one blew the 
heap so quickly, who knows how much of my production code might burning memory 
unnecessarily but not quite as fatally. Do you disassemble the functions and 
observe some smoking gun? How do you peek under the covers to see where the 
problem is? 

Luke.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to