Hard to say, I can't think of a change that would directly change how
the shaed code would work.

The "ArchiveReader" type hint on "warc-value" seems to be incorrect,
because it is used as a seq by "doseq". Assuming this is the correct
ArchiveReader
(http://crawler.archive.org/apidocs/org/archive/io/ArchiveReader.html) I
don't see how it could be a seq. So warc-value must be a seq, and is
likely to be lazy-seq. I wonder if you could be running in to a problem
with chunking or something(maybe you have a lazy-seq that needs to be
consumed strictly one after another, but a chunked seq would realize 32
at a time). "iterate" and "range" I think both had changes around then
so I would look at how that seq is constructed. If this is the issue,
you may want to look at using the clojure.core.protocols/CollReduce
protocol to process whatever the warc-value seq is based on instead of
using a seq.

On 03/07/2016 03:25 PM, [email protected] wrote:
> Hey clojurians,
> 
> I am using a java library that reads WARC files (an internet archive
> format) to use with hadoop. 
> 
> I was recently motivated to upgrade this project's clojure from 1.6 to
> 1.8 (to be able to use the recent (wonderful!) cider), and I got quite a
> strange behavior, that I managed to reduce to a simple example (on
> github
> <https://github.com/vadali/warc-cc/blob/upgrade-clojure/src/warc_cc/example.clj>)
> 
>       
> (defn mapper-map [this ^Text key ^ArchiveReader warc-value ^MapContext
> context]
> 
> (doseq [^ArchiveRecord r warc-value]
>       (let [header (.getHeader r)
>       mime (.getMimetype header)]
>       (if (plain-text? mime)
>       (println "got " (.available r))
>       
> 
>       
> 
> 
> Using any clojure version prior to 1.7.0-alpha6 (meaning, alpha5 and
> below), this code works great, and I get plenty of different "got %d"
> printed to the console with different sizes.
> 
> However, upgrading to 1.7.0-alpha6 and above, I am getting constant "got
> 0" for every record in the file, and nothing (obviously) gets computed.
> 
> I tried to see if I can find the culprit
> using 
> https://github.com/clojure/clojure/compare/clojure-1.7.0-alpha5...clojure-1.7.0-alpha6
> and couldnt find an obvious problem. I thought I might ask the list for
> pointers before I deep dive into this any further. 
> 
> If you wish to help with this problem by checking it on your machine,
> you could clone https://github.com/vadali/warc-cc/tree/upgrade-clojure
> (use upgrade-clojure branch), get the example file into the root dir of
> the cloned project using 
> 
>            s3cmd get
> s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-48/segments/1387345775423/wet/CC-MAIN-20131218054935-00092-ip-10-33-133-15.ec2.internal.warc.wet.gz
> 
> and run using lein test warc-cc.example.
> 
> Thanks!
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout.


-- 
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to