Interesting. Do You have an error report filed anywhere to peruse? M On Thu, 4 May 2017 at 5:05 PM, Ketil Malde <ke...@malde.org> wrote:
> > > I know it may be a long shot, but did you consider using columnar data > store like Apache Arrow? > > Arrow might be an option, but is there a Haskell interface? (Googling > gives the obvious hits regarding arrows, and Google doesn't seem to care > about me adding +apache to the search, it gives me result where > "+apache" is overstruck.) > > > Without knowing more about your application it is a bit difficult to > produce more hints. > > What is your application? > > The short story is that I extract a number of 64-bit values from my > data, and want to maintain frequency counts for each unique value. So > there'll be on the order of 10^9 (plus/minus an order of magnitude) > unique values, with counts ranging from one to a few million (and large > values being rare). > > The long explanation is that I'm doing k-mer counts for molecular > sequences, > breaking DNA sequence data into overlapping words of fixed size (the > parameter k), and counting their occurrences. I encode them as Word64, > using two bits per nucleotide (the alphabet is A, C, G, and T). This is > of course a fairly staple thing to do, and there is no lack of > alternative programs that do it - but I'd like mine to work anyway, and > it annoys me to have run into this particular bug. Whether it is my own > fault, in the Judy FFI, the GHC runtime or libraries, the libjudy code, > GHC compilation issues, or a hardware error. > > -k > -- > If I haven't seen further, it is by standing in the footprints of giants >
_______________________________________________ Biohaskell mailing list Biohaskell@biohaskell.org http://biohaskell.org/cgi-bin/mailman/listinfo/biohaskell