Other people at my firm who know a lot about binary files couldn't figure out the parts of the file that I am skipping over. Part of the issue is that there are several different files (dbs extension files) like this that I have to process and the structures do change depending on the source of these files.
In short, the problem is over my head and I was hoping to go right to the correct bit and read, which would make things much easier. I guess not... Thanks for your help though. Anyone else? thanks, ben On Tue, Jun 19, 2012 at 10:10 AM, jim holtman <jholt...@gmail.com> wrote: > I am not sure why reading through 'bit-by-bit' gets you to where you > want to be. I assume that the file has some structure, even though it > may be changing daily. You mentioned the various types of data that > it might contain; are they all in 'byte' sized chucks? If you really > have data that begins in the middle of a byte and then extends over > several bytes, you will have to write some functions that will pull > out this data and then reconstruct it into an object (e.g., integer, > numeric, ...) that R understands. Can you provide some more > definition of what the data actually looks like and how you would find > the "pattern" of the data. Almost all systems read at the lowest > level byte sized chucks, and if you really have to get down to the bit > level to reconstruct the data, then you have to write the unpack/pack > functions. This can all be done once you understand the structure of > the data. So some examples would be useful if you want someone to > propose a solution. > > On Tue, Jun 19, 2012 at 11:54 AM, Ben quant <ccqu...@gmail.com> wrote: > > Hello, > > > > Has a function been built that will skip to a certain bit in a binary > file? > > > > As of 2009 the answer was 'no': > > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html > > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html > > > > If you feel I don't need to (like in the links above), please provide > some > > help. (Note this is my first time working with binary files.) > > > > I'm still working on the script, but here is where I am right now. The > for > > loop is being used because: > > > > 1) I have to get down to correct position then get the info I want/need. > > The stuff I am reading through (x) is not fully understood and it is a > mix > > of various chars, floats, integers, etc. of various sizes etc. so I don't > > know who many bytes to read in unless I read them bit by bit. (The > > information and structure of the information changes daily so I'm > skipping > > over it.) > > 2) If I skip all in one readBin() my 'n' value is often up to 20 times > too > > big (I get an error) and/or R won't let me "allocate a vector of > size...." > > etc. So I split it up into chunks (divide by 20 etc.) and read each chuck > > then trash each part that is readBin()'d. Then the last line I get the > data > > that I want (data1). > > > > Here is my working code: > > > > # I have to read 'junk' bits from the to.read file which is huge integer > so > > I divide it up and loop through to.read in parts (jb_part). > > divr = 20 > > mod = junk %% divr > > > > jb_part = as.integer(junk/divr) > > jb_part_mod = jb_part + mod # catch the remainder/modulus > > > > to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to > the > > binary file > > # loop in chunks to where I want to be > > for(i in 1:(divr-1)){ > > x = readBin(to.read,"raw",n=jb_part,size=1) > > x = NULL # trash the result b/c I don't want it > > } > > # read a a little more to include the remainder/modulus bits left over by > > dividing by 20 above > > x = readBin(to.read,'raw',n=jb_part_mod,size=1) > > x = NULL # trash it > > > > # finally get the data that I want > > data1 = readBin(to.read,double(),n=some_number,size=size_to_use) > > > > This works, but it is SLOW! Any ideas on how to get down to the correct > > bit a bit quicker (pun intended). :) > > > > Thanks for any help! > > > > Ben > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.