Hi Alan,

On Friday, March 8, 2013 4:02:18 PM UTC+1, Alan Busby wrote:
>
> Hi Bernard, 
>
> I'd certainly like to add support for binary files, but as I haven't had a 
> need for it myself I haven't had a good place to start.
>
> As Java NIO's mmap() doesn't support ranges over 2GB, I've had to paste 
> together multiple mmap's to cover files that are larger than 2GB. 
> So if a record ended up spanning two mmap()'s, you couldn't return the raw 
> data as a single object without copying it into a new buffer first.
>
> Also, if you provide a fixed record size in bytes for "doing the idx 
> offset maths", why do you need the end idx for the current line as well?
> For example if you say file.bin is full of records each 100B in size, and 
> you ask for the 10th record; don't you already know that the length of the 
> record is 100B?
>
>
Indeed, the correlation between txt/binary and char (i.e \n) 
delimited/fixed length record is very strong. However in my case I want to 
first handle a \n delimited (txt) file as binary for performance reasons.
The context is that I have to consider all the lines of data, but might not 
have to do "heavy" processing on all of them, so I want to do as few work 
as possible on each line (i.e. not construct any java.lang.String).
This is in no way Clojure specific, I have two implementations in Java of a 
small Minimum Spanning Tree program :
- one is constructing Strings from all the lines: 
https://www.refheap.com/paste/12312
- one is using offsets from a raw ByteBuffer : 
https://www.refheap.com/paste/12313

As most of the lines are not really processed (just sorted according to the 
last field), being able to only peek at the relevant bytes instead of 
constructing full blown java.lang.Strings is a huge performance boost.
FWIW, as far as performance i concerned, I draw the line not between 
Clojure and Java but between objects (constructed by copying some data 
somewhere on the heap) and arrays of primitive data types, because 
nowadays, cache locality trumps everything (once you got rid of reflection 
calls in Clojure, obviously).

So ideally, maybe 2 x 2 combinations (String / offset in ByteArray) x (char 
delimited / fixed length) would be needed to cover all the needs.

Thanks again for sharing your library !

Cheers,

Bernard

PS: Is there a rationale for returning nil instead of empty String "" on 
empty lines with iota/vec?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to