Hard to say, I can't think of a change that would directly change how the shaed code would work.
The "ArchiveReader" type hint on "warc-value" seems to be incorrect, because it is used as a seq by "doseq". Assuming this is the correct ArchiveReader (http://crawler.archive.org/apidocs/org/archive/io/ArchiveReader.html) I don't see how it could be a seq. So warc-value must be a seq, and is likely to be lazy-seq. I wonder if you could be running in to a problem with chunking or something(maybe you have a lazy-seq that needs to be consumed strictly one after another, but a chunked seq would realize 32 at a time). "iterate" and "range" I think both had changes around then so I would look at how that seq is constructed. If this is the issue, you may want to look at using the clojure.core.protocols/CollReduce protocol to process whatever the warc-value seq is based on instead of using a seq. On 03/07/2016 03:25 PM, [email protected] wrote: > Hey clojurians, > > I am using a java library that reads WARC files (an internet archive > format) to use with hadoop. > > I was recently motivated to upgrade this project's clojure from 1.6 to > 1.8 (to be able to use the recent (wonderful!) cider), and I got quite a > strange behavior, that I managed to reduce to a simple example (on > github > <https://github.com/vadali/warc-cc/blob/upgrade-clojure/src/warc_cc/example.clj>) > > > (defn mapper-map [this ^Text key ^ArchiveReader warc-value ^MapContext > context] > > (doseq [^ArchiveRecord r warc-value] > (let [header (.getHeader r) > mime (.getMimetype header)] > (if (plain-text? mime) > (println "got " (.available r)) > > > > > > Using any clojure version prior to 1.7.0-alpha6 (meaning, alpha5 and > below), this code works great, and I get plenty of different "got %d" > printed to the console with different sizes. > > However, upgrading to 1.7.0-alpha6 and above, I am getting constant "got > 0" for every record in the file, and nothing (obviously) gets computed. > > I tried to see if I can find the culprit > using > https://github.com/clojure/clojure/compare/clojure-1.7.0-alpha5...clojure-1.7.0-alpha6 > and couldnt find an obvious problem. I thought I might ask the list for > pointers before I deep dive into this any further. > > If you wish to help with this problem by checking it on your machine, > you could clone https://github.com/vadali/warc-cc/tree/upgrade-clojure > (use upgrade-clojure branch), get the example file into the root dir of > the cloned project using > > s3cmd get > s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-48/segments/1387345775423/wet/CC-MAIN-20131218054935-00092-ip-10-33-133-15.ec2.internal.warc.wet.gz > > and run using lein test warc-cc.example. > > Thanks! > > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout. -- And what is good, Phaedrus, And what is not good— Need we ask anyone to tell us these things? -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
