On Tuesday, May 5, 2015 at 11:18:56 PM UTC-4, Sam Raker wrote:
>
> I've got two really big CSV files that I need to compare. Stay tuned for 
> the semi-inevitable "how do I optimize over this M x N space?" question, 
> but for now I'm still trying to get the data into a reasonable format--I'm 
> planning on converting each line into a map, with keys coming from either 
> the first line of the file, or a separate list I was given. Non-lazy 
> approaches run into memory limitations; lazy approaches run into  "Stream 
> closed" exceptions while trying to coordinate `with-open` and `line-seq`. 
> Given that memory is already tight, I'd like to avoid leaving open 
> files/file descriptors/readers/whatever-the-term-in-clojure-is lying 
> around. I've tried writing a macro, I've tried transducers, I've tried 
> passing around the open reader along with the lazy seq, none successfully, 
> albeit none necessarily particularly well. Any suggestions on streaming 
> such big files?
>

Something like this didn't work?

(with-open [rdr1 ...
            rdr2 ...]
  (let [l1 (line-seq rdr1)
        l2 (line-seq rdr2)]
    (->> (map something l1 l2)
      (filter whatever)
      (first))))

For instance, to check if two text files are the same, something would be 
not= and whatever would be identity, and the result would be nil if they 
were the same, and something truthy otherwise. The first has the effect of 
short circuiting when the result is known, and neither line-seq's head 
should be held. The first also has the effect of ensuring the with-open 
scope is not left until as much of both line-seqs are consumed as will be 
needed. Reduce and the use of trans/reducers that get reduced would have 
the same effect.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to