Re: [R] Accelerating binRead

2016-09-18 Thread Michael Sumner
Thanks Henrik, that's it. Fwiw I found this old post too, I am still
surprised this doesn't seem to get used a lot(?). It's a "neat trick" for
row-wise binary, without compiled code.

http://cyclemumner.blogspot.com.au/2010/06/read-las-data-with-r.html?m=1

Also you should look at Paul Murrell's hexView package, and associated R
Journal paper.

Cheers, Mike

On Mon, 19 Sep 2016, 02:20 Henrik Bengtsson 
wrote:

> I second Mike's proposal - it works, e.g.
>
> https://github.com/HenrikBengtsson/affxparser/blob/5bf1a9162904c56d59c4735a8d7eb427e4f085e4/R/readCcg.R#L535-L583
>
> Here's an outline. Say each row consists of tuple (=4-byte
> integer, =4-byte float, ss=2 byte integer) so that the
> byte-by-byte content of your file look like this:
>
>   ss
>   ss
>   ss
>   ...
>   ss
>
> Then read this is as raw bytes (file_size can also be a very large
> number in case it's unknown):
>
>   raw <- readBin(con, what="raw", n=file_size)
>
> Turn into a (4+4+2)-by-K raw matrix:
>
>   raw <- matrix(raw, nrow=4+4+2)
>
> so that your raw bytes has the following layout:
>
>   iii ... i
>   iii ... i
>   iii ... i
>   iii ... i
>   fff ... f
>   fff ... f
>   fff ... f
>   fff ... f
>   sss ... s
>   sss ... s
>
> Then extract the three submatrices of interest:
>
>    <- raw[1:4,]
>    <- raw[5:8,]
>   ss <- raw[9:10,]
>
> Here you can discard raw, i.e. rm(list="raw").
>
> Since R stores matrices in a column-by-column order internally, your
> bytes are already in the proper order.  Finally, re-read these with
> appropriate readBin() settings, e.g.
>
>   i <- readBin(, what="integer", size=4L)
>   f <- readBin(, what="double", size=4L)
>   s <- readBin(ss, what="integer", size=2L)
>
> Put into a 3-by-K data.frame:
>
>   data <- data.frame(i=i, f=f, s=s)
>
> /Henrik
>
> On Sun, Sep 18, 2016 at 8:02 AM, Philippe de Rochambeau 
> wrote:
> > I would gladly examine your example, Mike.
> > Cheers,
> > Philippe
> >
> >> Le 18 sept. 2016 à 16:05, Michael Sumner  a écrit :
> >>
> >>
> >>
> >>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau 
> wrote:
> >>> Please find below code that attempts to read ints, longs and floats
> from a binary file (which is a simplification of my original program).
> >>> Please disregard the R inefficiencies, such as using rbind, for now.
> >>> I’ve also included Java code to generate the binary file.
> >>> The output shows that, at one point, anInt becomes undefined.
> Unfortunately, I couldn’t find the correct R function to determine whether
> inInt is undefined or not, as is.null, is.nan, and is.infinite don’t work.
> >>> Any help would be much appreciated.
> >>> Many thanks in advance.
> >>> Philippe
> >>>
> >>> ———
> >>> [1] "anInt = 1"
> >>> [1] "is.null  FALSE"
> >>> [1] "is.nan  FALSE"
> >>> [1] "is.infinite  FALSE"
> >>> [1] "aLong = 2"
> >>> [1] "aFloat = 3.0007209778"
> >>> [1] "--"
> >>> [1] "anInt = 2"
> >>> [1] "is.null  FALSE"
> >>> [1] "is.nan  FALSE"
> >>> [1] "is.infinite  FALSE"
> >>> [1] "aLong = 22"
> >>> [1] "aFloat = 13.4644002914429"
> >>> [1] "--"
> >>> [1] "anInt = 3"
> >>> [1] "is.null  FALSE"
> >>> [1] "is.nan  FALSE"
> >>> [1] "is.infinite  FALSE"
> >>> [1] "aLong = 55"
> >>> [1] "aFloat = 45.007873535"
> >>> [1] "--"
> >>> [1] "anInt = "
> >>> [1] "is.null  FALSE"
> >>> [1] "is.nan  "
> >>> [1] "is.infinite  "
> >>> [1] "aLong = "
> >>> [1] "aFloat = "
> >>> [1] "--"
> >>>  [,1]  [,2]  [,3]
> >>> [1,] 1 2 3.
> >>> [2,] 2 2213.4644
> >>> [3,] 3 5545.
> >>> [4,] Integer,0 Integer,0 Numeric,0
> >>> >
> >>>
> >>> ---
> >>>
> >>>
> >>> —
> >>>
> >>> readFile <- function(inputPath) {
> >>>   URL <- file(inputPath, "rb")
> >>>   PLT <- matrix(nrow=0, ncol=3)
> >>>   counte <- 0
> >>>   max <- 4
> >>>   while (counte < max) {
> >>> anInt <- readBin(con=URL, what=integer(), size=4, n=1,
> endian="big")
> >>> print(paste("anInt =", anInt))
> >>> #if (! (anInt == 0)) { print(paste("empty int")); break }
> >>> print(paste("is.null ", is.null(anInt)))
> >>> print(paste("is.nan ", is.nan(anInt)))
> >>> print(paste("is.infinite ", is.infinite(anInt)))
> >>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big")
> >>> print(paste("aLong =", aLong))
> >>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big")
> >>> print(paste("aFloat =", aFloat))
> >>> print("--")
> >>> PLT <- rbind(PLT, list(anInt, aLong, aFloat))
> >>> counte <- counte + 1
> >>>   } # end while
> >>>   close(URL)
> >>>   PLT
> >>> }
> >>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin"
> >>> PLT2 <- readFile(fichier)
> >>> print(PLT2)
> >>> —
> >>>
> >>> import 

Re: [R] Accelerating binRead

2016-09-18 Thread Henrik Bengtsson
I second Mike's proposal - it works, e.g.
https://github.com/HenrikBengtsson/affxparser/blob/5bf1a9162904c56d59c4735a8d7eb427e4f085e4/R/readCcg.R#L535-L583

Here's an outline. Say each row consists of tuple (=4-byte
integer, =4-byte float, ss=2 byte integer) so that the
byte-by-byte content of your file look like this:

  ss
  ss
  ss
  ...
  ss

Then read this is as raw bytes (file_size can also be a very large
number in case it's unknown):

  raw <- readBin(con, what="raw", n=file_size)

Turn into a (4+4+2)-by-K raw matrix:

  raw <- matrix(raw, nrow=4+4+2)

so that your raw bytes has the following layout:

  iii ... i
  iii ... i
  iii ... i
  iii ... i
  fff ... f
  fff ... f
  fff ... f
  fff ... f
  sss ... s
  sss ... s

Then extract the three submatrices of interest:

   <- raw[1:4,]
   <- raw[5:8,]
  ss <- raw[9:10,]

Here you can discard raw, i.e. rm(list="raw").

Since R stores matrices in a column-by-column order internally, your
bytes are already in the proper order.  Finally, re-read these with
appropriate readBin() settings, e.g.

  i <- readBin(, what="integer", size=4L)
  f <- readBin(, what="double", size=4L)
  s <- readBin(ss, what="integer", size=2L)

Put into a 3-by-K data.frame:

  data <- data.frame(i=i, f=f, s=s)

/Henrik

On Sun, Sep 18, 2016 at 8:02 AM, Philippe de Rochambeau  wrote:
> I would gladly examine your example, Mike.
> Cheers,
> Philippe
>
>> Le 18 sept. 2016 à 16:05, Michael Sumner  a écrit :
>>
>>
>>
>>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau  wrote:
>>> Please find below code that attempts to read ints, longs and floats from a 
>>> binary file (which is a simplification of my original program).
>>> Please disregard the R inefficiencies, such as using rbind, for now.
>>> I’ve also included Java code to generate the binary file.
>>> The output shows that, at one point, anInt becomes undefined. 
>>> Unfortunately, I couldn’t find the correct R function to determine whether 
>>> inInt is undefined or not, as is.null, is.nan, and is.infinite don’t work.
>>> Any help would be much appreciated.
>>> Many thanks in advance.
>>> Philippe
>>>
>>> ———
>>> [1] "anInt = 1"
>>> [1] "is.null  FALSE"
>>> [1] "is.nan  FALSE"
>>> [1] "is.infinite  FALSE"
>>> [1] "aLong = 2"
>>> [1] "aFloat = 3.0007209778"
>>> [1] "--"
>>> [1] "anInt = 2"
>>> [1] "is.null  FALSE"
>>> [1] "is.nan  FALSE"
>>> [1] "is.infinite  FALSE"
>>> [1] "aLong = 22"
>>> [1] "aFloat = 13.4644002914429"
>>> [1] "--"
>>> [1] "anInt = 3"
>>> [1] "is.null  FALSE"
>>> [1] "is.nan  FALSE"
>>> [1] "is.infinite  FALSE"
>>> [1] "aLong = 55"
>>> [1] "aFloat = 45.007873535"
>>> [1] "--"
>>> [1] "anInt = "
>>> [1] "is.null  FALSE"
>>> [1] "is.nan  "
>>> [1] "is.infinite  "
>>> [1] "aLong = "
>>> [1] "aFloat = "
>>> [1] "--"
>>>  [,1]  [,2]  [,3]
>>> [1,] 1 2 3.
>>> [2,] 2 2213.4644
>>> [3,] 3 5545.
>>> [4,] Integer,0 Integer,0 Numeric,0
>>> >
>>>
>>> ---
>>>
>>>
>>> —
>>>
>>> readFile <- function(inputPath) {
>>>   URL <- file(inputPath, "rb")
>>>   PLT <- matrix(nrow=0, ncol=3)
>>>   counte <- 0
>>>   max <- 4
>>>   while (counte < max) {
>>> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big")
>>> print(paste("anInt =", anInt))
>>> #if (! (anInt == 0)) { print(paste("empty int")); break }
>>> print(paste("is.null ", is.null(anInt)))
>>> print(paste("is.nan ", is.nan(anInt)))
>>> print(paste("is.infinite ", is.infinite(anInt)))
>>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big")
>>> print(paste("aLong =", aLong))
>>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big")
>>> print(paste("aFloat =", aFloat))
>>> print("--")
>>> PLT <- rbind(PLT, list(anInt, aLong, aFloat))
>>> counte <- counte + 1
>>>   } # end while
>>>   close(URL)
>>>   PLT
>>> }
>>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin"
>>> PLT2 <- readFile(fichier)
>>> print(PLT2)
>>> —
>>>
>>> import java.io.*;
>>>
>>> public class Main {
>>>
>>> Main() {
>>> writeData();
>>> }
>>>
>>> public static void main(String[] args) {
>>> new Main();
>>> }
>>>
>>> public void writeData() {
>>>
>>> final String path = 
>>> "/Users/philippe/Desktop/datatests/data0.bin";
>>>
>>> DataOutputStream dos;
>>> try {
>>> dos = new DataOutputStream(new 
>>> BufferedOutputStream(new FileOutputStream(path)));
>>> // big endian write! ("high byte first") , see 
>>> https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
>>> 

Re: [R] Accelerating binRead

2016-09-18 Thread Philippe de Rochambeau
I would gladly examine your example, Mike.
Cheers,
Philippe

> Le 18 sept. 2016 à 16:05, Michael Sumner  a écrit :
> 
> 
> 
>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau  wrote:
>> Please find below code that attempts to read ints, longs and floats from a 
>> binary file (which is a simplification of my original program).
>> Please disregard the R inefficiencies, such as using rbind, for now.
>> I’ve also included Java code to generate the binary file.
>> The output shows that, at one point, anInt becomes undefined. Unfortunately, 
>> I couldn’t find the correct R function to determine whether inInt is 
>> undefined or not, as is.null, is.nan, and is.infinite don’t work.
>> Any help would be much appreciated.
>> Many thanks in advance.
>> Philippe
>> 
>> ———
>> [1] "anInt = 1"
>> [1] "is.null  FALSE"
>> [1] "is.nan  FALSE"
>> [1] "is.infinite  FALSE"
>> [1] "aLong = 2"
>> [1] "aFloat = 3.0007209778"
>> [1] "--"
>> [1] "anInt = 2"
>> [1] "is.null  FALSE"
>> [1] "is.nan  FALSE"
>> [1] "is.infinite  FALSE"
>> [1] "aLong = 22"
>> [1] "aFloat = 13.4644002914429"
>> [1] "--"
>> [1] "anInt = 3"
>> [1] "is.null  FALSE"
>> [1] "is.nan  FALSE"
>> [1] "is.infinite  FALSE"
>> [1] "aLong = 55"
>> [1] "aFloat = 45.007873535"
>> [1] "--"
>> [1] "anInt = "
>> [1] "is.null  FALSE"
>> [1] "is.nan  "
>> [1] "is.infinite  "
>> [1] "aLong = "
>> [1] "aFloat = "
>> [1] "--"
>>  [,1]  [,2]  [,3]
>> [1,] 1 2 3.
>> [2,] 2 2213.4644
>> [3,] 3 5545.
>> [4,] Integer,0 Integer,0 Numeric,0
>> >
>> 
>> ---
>> 
>> 
>> —
>> 
>> readFile <- function(inputPath) {
>>   URL <- file(inputPath, "rb")
>>   PLT <- matrix(nrow=0, ncol=3)
>>   counte <- 0
>>   max <- 4
>>   while (counte < max) {
>> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big")
>> print(paste("anInt =", anInt))
>> #if (! (anInt == 0)) { print(paste("empty int")); break }
>> print(paste("is.null ", is.null(anInt)))
>> print(paste("is.nan ", is.nan(anInt)))
>> print(paste("is.infinite ", is.infinite(anInt)))
>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big")
>> print(paste("aLong =", aLong))
>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big")
>> print(paste("aFloat =", aFloat))
>> print("--")
>> PLT <- rbind(PLT, list(anInt, aLong, aFloat))
>> counte <- counte + 1
>>   } # end while
>>   close(URL)
>>   PLT
>> }
>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin"
>> PLT2 <- readFile(fichier)
>> print(PLT2)
>> —
>> 
>> import java.io.*;
>> 
>> public class Main {
>> 
>> Main() {
>> writeData();
>> }
>> 
>> public static void main(String[] args) {
>> new Main();
>> }
>> 
>> public void writeData() {
>> 
>> final String path = 
>> "/Users/philippe/Desktop/datatests/data0.bin";
>> 
>> DataOutputStream dos;
>> try {
>> dos = new DataOutputStream(new 
>> BufferedOutputStream(new FileOutputStream(path)));
>> // big endian write! ("high byte first") , see 
>> https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
>> dos.writeInt(1);
>> dos.writeLong(2L);
>> dos.writeFloat(3.F);
>> 
>> dos.writeInt(2);
>> dos.writeLong(22L);
>> dos.writeFloat(13.4644F);
>> 
>> dos.writeInt(3);
>> dos.writeLong(55L);
>> dos.writeFloat(45.F);
>> 
>> dos.close();
>> } catch (FileNotFoundException e) {
>> e.printStackTrace();
>> } catch (IOException ioe) {
>> ioe.printStackTrace();
>> }
>> 
>> }
>> 
>> }
>> 
>> 
>> —
>> 
>> 
>> 
>> 
>> 
>> 
>> > Le 17 sept. 2016 à 20:45, Philippe de Rochambeau  a écrit :
>> >
>> > Hi Jim,
>> > this is exactly the answer I was look for. Many thanks. I didn’t R had a 
>> > pack function, as in PERL.
>> > To answer your earlier question, I am trying to update legacy code to read 
>> > a binary file with unknown size, over a network, slice up it into rows 
>> > each containing an integer, an integer, a long, a short, a float and a 
>> > float, and stuff the rows into a matrix.
> 
> 
> 
> It's possible to read all rows fast as raw(), then parse in a vectorised way 
> with matrix indexing to group the bytes appropriately. There is an example on 
> the mailing list somewhere, but otherwise I can show an example if that's of 
> 

Re: [R] Accelerating binRead

2016-09-18 Thread Michael Sumner
On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau  wrote:

> Please find below code that attempts to read ints, longs and floats from a
> binary file (which is a simplification of my original program).
> Please disregard the R inefficiencies, such as using rbind, for now.
> I’ve also included Java code to generate the binary file.
> The output shows that, at one point, anInt becomes undefined.
> Unfortunately, I couldn’t find the correct R function to determine whether
> inInt is undefined or not, as is.null, is.nan, and is.infinite don’t work.
> Any help would be much appreciated.
> Many thanks in advance.
> Philippe
>
> ———
> [1] "anInt = 1"
> [1] "is.null  FALSE"
> [1] "is.nan  FALSE"
> [1] "is.infinite  FALSE"
> [1] "aLong = 2"
> [1] "aFloat = 3.0007209778"
> [1] "--"
> [1] "anInt = 2"
> [1] "is.null  FALSE"
> [1] "is.nan  FALSE"
> [1] "is.infinite  FALSE"
> [1] "aLong = 22"
> [1] "aFloat = 13.4644002914429"
> [1] "--"
> [1] "anInt = 3"
> [1] "is.null  FALSE"
> [1] "is.nan  FALSE"
> [1] "is.infinite  FALSE"
> [1] "aLong = 55"
> [1] "aFloat = 45.007873535"
> [1] "--"
> [1] "anInt = "
> [1] "is.null  FALSE"
> [1] "is.nan  "
> [1] "is.infinite  "
> [1] "aLong = "
> [1] "aFloat = "
> [1] "--"
>  [,1]  [,2]  [,3]
> [1,] 1 2 3.
> [2,] 2 2213.4644
> [3,] 3 5545.
> [4,] Integer,0 Integer,0 Numeric,0
> >
>
> ---
>
>
> —
>
> readFile <- function(inputPath) {
>   URL <- file(inputPath, "rb")
>   PLT <- matrix(nrow=0, ncol=3)
>   counte <- 0
>   max <- 4
>   while (counte < max) {
> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big")
> print(paste("anInt =", anInt))
> #if (! (anInt == 0)) { print(paste("empty int")); break }
> print(paste("is.null ", is.null(anInt)))
> print(paste("is.nan ", is.nan(anInt)))
> print(paste("is.infinite ", is.infinite(anInt)))
> aLong <- readBin(URL, integer(), size=8, n=1, endian="big")
> print(paste("aLong =", aLong))
> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big")
> print(paste("aFloat =", aFloat))
> print("--")
> PLT <- rbind(PLT, list(anInt, aLong, aFloat))
> counte <- counte + 1
>   } # end while
>   close(URL)
>   PLT
> }
> fichier <- "/Users/philippe/Desktop/datatests/data0.bin"
> PLT2 <- readFile(fichier)
> print(PLT2)
> —
>
> import java.io.*;
>
> public class Main {
>
> Main() {
> writeData();
> }
>
> public static void main(String[] args) {
> new Main();
> }
>
> public void writeData() {
>
> final String path =
> "/Users/philippe/Desktop/datatests/data0.bin";
>
> DataOutputStream dos;
> try {
> dos = new DataOutputStream(new
> BufferedOutputStream(new FileOutputStream(path)));
> // big endian write! ("high byte first") , see
> https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
> dos.writeInt(1);
> dos.writeLong(2L);
> dos.writeFloat(3.F);
>
> dos.writeInt(2);
> dos.writeLong(22L);
> dos.writeFloat(13.4644F);
>
> dos.writeInt(3);
> dos.writeLong(55L);
> dos.writeFloat(45.F);
>
> dos.close();
> } catch (FileNotFoundException e) {
> e.printStackTrace();
> } catch (IOException ioe) {
> ioe.printStackTrace();
> }
>
> }
>
> }
>
>
> —
>
>
>
>
>
>
> > Le 17 sept. 2016 à 20:45, Philippe de Rochambeau  a
> écrit :
> >
> > Hi Jim,
> > this is exactly the answer I was look for. Many thanks. I didn’t R had a
> pack function, as in PERL.
> > To answer your earlier question, I am trying to update legacy code to
> read a binary file with unknown size, over a network, slice up it into rows
> each containing an integer, an integer, a long, a short, a float and a
> float, and stuff the rows into a matrix.
>


It's possible to read all rows fast as raw(), then parse in a vectorised
way with matrix indexing to group the bytes appropriately. There is an
example on the mailing list somewhere, but otherwise I can show an example
if that's of interest.


Cheers, Mike


> Best regards,
> > Philippe
> >
> >> Le 17 sept. 2016 à 20:38, jim holtman > a écrit :
> >>
> >> Here is an example of how to do it:
> >>
> >> x <- 1:10  # integer values
> >> xf <- seq(1.0, 2, by = 0.1)  # floating point
> >>
> >> setwd("d:/temp")
> >>
> >> # create file to write to

Re: [R] Accelerating binRead

2016-09-18 Thread Philippe de Rochambeau
Please find below code that attempts to read ints, longs and floats from a 
binary file (which is a simplification of my original program).
Please disregard the R inefficiencies, such as using rbind, for now.
I’ve also included Java code to generate the binary file.
The output shows that, at one point, anInt becomes undefined. Unfortunately, I 
couldn’t find the correct R function to determine whether inInt is undefined or 
not, as is.null, is.nan, and is.infinite don’t work.
Any help would be much appreciated.
Many thanks in advance.
Philippe

———
[1] "anInt = 1"
[1] "is.null  FALSE"
[1] "is.nan  FALSE"
[1] "is.infinite  FALSE"
[1] "aLong = 2"
[1] "aFloat = 3.0007209778"
[1] "--"
[1] "anInt = 2"
[1] "is.null  FALSE"
[1] "is.nan  FALSE"
[1] "is.infinite  FALSE"
[1] "aLong = 22"
[1] "aFloat = 13.4644002914429"
[1] "--"
[1] "anInt = 3"
[1] "is.null  FALSE"
[1] "is.nan  FALSE"
[1] "is.infinite  FALSE"
[1] "aLong = 55"
[1] "aFloat = 45.007873535"
[1] "--"
[1] "anInt = "
[1] "is.null  FALSE"
[1] "is.nan  "
[1] "is.infinite  "
[1] "aLong = "
[1] "aFloat = "
[1] "--"
 [,1]  [,2]  [,3] 
[1,] 1 2 3.   
[2,] 2 2213.4644  
[3,] 3 5545.  
[4,] Integer,0 Integer,0 Numeric,0
> 

---


—

readFile <- function(inputPath) {
  URL <- file(inputPath, "rb")
  PLT <- matrix(nrow=0, ncol=3)
  counte <- 0
  max <- 4
  while (counte < max) {
anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big")
print(paste("anInt =", anInt))
#if (! (anInt == 0)) { print(paste("empty int")); break }
print(paste("is.null ", is.null(anInt)))
print(paste("is.nan ", is.nan(anInt)))
print(paste("is.infinite ", is.infinite(anInt)))
aLong <- readBin(URL, integer(), size=8, n=1, endian="big") 
print(paste("aLong =", aLong))
aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big")
print(paste("aFloat =", aFloat))
print("--")
PLT <- rbind(PLT, list(anInt, aLong, aFloat))
counte <- counte + 1
  } # end while
  close(URL)
  PLT
}
fichier <- "/Users/philippe/Desktop/datatests/data0.bin"
PLT2 <- readFile(fichier)
print(PLT2)
—

import java.io.*;

public class Main {

Main() {
writeData();
}

public static void main(String[] args) {
new Main();
}

public void writeData() {

final String path = 
"/Users/philippe/Desktop/datatests/data0.bin";

DataOutputStream dos;
try {
dos = new DataOutputStream(new BufferedOutputStream(new 
FileOutputStream(path)));
// big endian write! ("high byte first") , see 
https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
dos.writeInt(1);
dos.writeLong(2L);
dos.writeFloat(3.F);

dos.writeInt(2);
dos.writeLong(22L);
dos.writeFloat(13.4644F);

dos.writeInt(3);
dos.writeLong(55L);
dos.writeFloat(45.F);

dos.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}

}

}


—






> Le 17 sept. 2016 à 20:45, Philippe de Rochambeau  a écrit :
> 
> Hi Jim,
> this is exactly the answer I was look for. Many thanks. I didn’t R had a pack 
> function, as in PERL.
> To answer your earlier question, I am trying to update legacy code to read a 
> binary file with unknown size, over a network, slice up it into rows each 
> containing an integer, an integer, a long, a short, a float and a float, and 
> stuff the rows into a matrix.
> Best regards,
> Philippe
> 
>> Le 17 sept. 2016 à 20:38, jim holtman > > a écrit :
>> 
>> Here is an example of how to do it:
>> 
>> x <- 1:10  # integer values
>> xf <- seq(1.0, 2, by = 0.1)  # floating point
>> 
>> setwd("d:/temp")
>> 
>> # create file to write to
>> output <- file('integer.bin', 'wb')
>> writeBin(x, output)  # write integer
>> writeBin(xf, output)  # write reals
>> close(output)
>> 
>> 
>> library(pack)
>> library(readr)
>> 
>> # read all the data at once
>> allbin <- read_file_raw('integer.bin')
>> 
>> # decode the data into a list
>> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))
>> 
>> 
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is 

Re: [R] Accelerating binRead

2016-09-18 Thread Philippe de Rochambeau
The only difference between the below code and my program is that the former 
assumes that the file only contains one row of 10 ints + 10 floats , whereas my 
program doesn’t know in advance how many rows the file contains, unless it 
downloads it first and computes the potential number of rows based on its size.

> Le 17 sept. 2016 à 20:45, Philippe de Rochambeau  a écrit :
> 
> Hi Jim,
> this is exactly the answer I was look for. Many thanks. I didn’t R had a pack 
> function, as in PERL.
> To answer your earlier question, I am trying to update legacy code to read a 
> binary file with unknown size, over a network, slice up it into rows each 
> containing an integer, an integer, a long, a short, a float and a float, and 
> stuff the rows into a matrix.
> Best regards,
> Philippe
> 
>> Le 17 sept. 2016 à 20:38, jim holtman > > a écrit :
>> 
>> Here is an example of how to do it:
>> 
>> x <- 1:10  # integer values
>> xf <- seq(1.0, 2, by = 0.1)  # floating point
>> 
>> setwd("d:/temp")
>> 
>> # create file to write to
>> output <- file('integer.bin', 'wb')
>> writeBin(x, output)  # write integer
>> writeBin(xf, output)  # write reals
>> close(output)
>> 
>> 
>> library(pack)
>> library(readr)
>> 
>> # read all the data at once
>> allbin <- read_file_raw('integer.bin')
>> 
>> # decode the data into a list
>> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))
>> 
>> 
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>>  
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>> 
>> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN > > wrote:
>> I noticed same issue but didnt care much :)
>> 
>> On Sat, Sep 17, 2016, 18:01 jim holtman > > wrote:
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>> 
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau > >
>> wrote:
>> 
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -
>> >
>> > # inputPath is something like http://myintranet/getData 
>> > ?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin > > ?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> > URL <- file(inputPath, "rb")
>> > PLT <- matrix(nrow=0, ncol=6)
>> > compteurDePrints = 0
>> > compteurDeLignes <- 0
>> > maxiPrints = 5
>> > displayData <- FALSE
>> > while (TRUE) {
>> > periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > if (dword1 < 0) {
>> > dword1 = dword1 + 2^32-1;
>> > }
>> > eventDate = (dword2*2^32 + dword1)/1000
>> > repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> > exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> > loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> > } # end while
>> > return(PLT)
>> > close(URL)
>> > }
>> >
>> > 
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org  mailing list -- To 
>> > UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help 
>> > 
>> > PLEASE do read the posting guide http://www.R-project.org/ 
>> > 
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> 
>> [[alternative HTML version deleted]]
>> 

Re: [R] Accelerating binRead

2016-09-18 Thread Philippe de Rochambeau
Hi Jim,
this is exactly the answer I was look for. Many thanks. I didn’t R had a pack 
function, as in PERL.
To answer your earlier question, I am trying to update legacy code to read a 
binary file with unknown size, over a network, slice up it into rows each 
containing an integer, an integer, a long, a short, a float and a float, and 
stuff the rows into a matrix.
Best regards,
Philippe

> Le 17 sept. 2016 à 20:38, jim holtman  a écrit :
> 
> Here is an example of how to do it:
> 
> x <- 1:10  # integer values
> xf <- seq(1.0, 2, by = 0.1)  # floating point
> 
> setwd("d:/temp")
> 
> # create file to write to
> output <- file('integer.bin', 'wb')
> writeBin(x, output)  # write integer
> writeBin(xf, output)  # write reals
> close(output)
> 
> 
> library(pack)
> library(readr)
> 
> # read all the data at once
> allbin <- read_file_raw('integer.bin')
> 
> # decode the data into a list
> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))
> 
> 
> 
> 
> Jim Holtman
> Data Munger Guru
>  
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN  > wrote:
> I noticed same issue but didnt care much :)
> 
> On Sat, Sep 17, 2016, 18:01 jim holtman  > wrote:
> Your example was not reproducible.  Also how do you "break" out of the
> "while" loop?
> 
> 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau  >
> wrote:
> 
> > Hello,
> > the following function, which stores numeric values extracted from a
> > binary file, into an R matrix, is very slow, especially when the said file
> > is several MB in size.
> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
> > newbie)?
> > Many thanks.
> > Best regards,
> > phiroc
> >
> >
> > -
> >
> > # inputPath is something like http://myintranet/getData 
> > ?
> > pathToFile=/usr/lib/xxx/yyy/data.bin  > ?
> > pathToFile=/usr/lib/xxx/yyy/data.bin>
> >
> > PLTreader <- function(inputPath){
> > URL <- file(inputPath, "rb")
> > PLT <- matrix(nrow=0, ncol=6)
> > compteurDePrints = 0
> > compteurDeLignes <- 0
> > maxiPrints = 5
> > displayData <- FALSE
> > while (TRUE) {
> > periodIndex <- readBin(URL, integer(), size=4, n=1,
> > endian="little") # int (4 bytes)
> > eventId <- readBin(URL, integer(), size=4, n=1,
> > endian="little") # int (4 bytes)
> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
> > n=1, endian="little") # int
> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
> > n=1, endian="little") # int
> > if (dword1 < 0) {
> > dword1 = dword1 + 2^32-1;
> > }
> > eventDate = (dword2*2^32 + dword1)/1000
> > repNum <- readBin(URL, integer(), size=2, n=1,
> > endian="little") # short (2 bytes)
> > exp <- readBin(URL, numeric(), size=4, n=1,
> > endian="little") # float (4 bytes, strangely enough, would expect 8)
> > loss <- readBin(URL, numeric(), size=4, n=1,
> > endian="little") # float (4 bytes)
> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
> > repNum, exp, loss))
> > } # end while
> > return(PLT)
> > close(URL)
> > }
> >
> > 
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list -- To 
> > UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > 
> > PLEASE do read the posting guide http://www.R-project.org/ 
> > 
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org  mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> 
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing 

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
Here is an example of how to do it:

x <- 1:10  # integer values
xf <- seq(1.0, 2, by = 0.1)  # floating point

setwd("d:/temp")

# create file to write to
output <- file('integer.bin', 'wb')
writeBin(x, output)  # write integer
writeBin(xf, output)  # write reals
close(output)


library(pack)
library(readr)

# read all the data at once
allbin <- read_file_raw('integer.bin')

# decode the data into a list
(result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN 
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman  wrote:
>
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau 
>> wrote:
>>
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said
>> file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -
>> >
>> > # inputPath is something like http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin > > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> > URL <- file(inputPath, "rb")
>> > PLT <- matrix(nrow=0, ncol=6)
>> > compteurDePrints = 0
>> > compteurDeLignes <- 0
>> > maxiPrints = 5
>> > displayData <- FALSE
>> > while (TRUE) {
>> > periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > if (dword1 < 0) {
>> > dword1 = dword1 + 2^32-1;
>> > }
>> > eventDate = (dword2*2^32 + dword1)/1000
>> > repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> > exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> > loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> > } # end while
>> > return(PLT)
>> > close(URL)
>> > }
>> >
>> > 
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
I would also suggest that you take a look at the 'pack' package which can
convert the binary input to the value you want.  Part of your performance
problems might be all the short reads that you are doing.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN 
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman  wrote:
>
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau 
>> wrote:
>>
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said
>> file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -
>> >
>> > # inputPath is something like http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin > > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> > URL <- file(inputPath, "rb")
>> > PLT <- matrix(nrow=0, ncol=6)
>> > compteurDePrints = 0
>> > compteurDeLignes <- 0
>> > maxiPrints = 5
>> > displayData <- FALSE
>> > while (TRUE) {
>> > periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > if (dword1 < 0) {
>> > dword1 = dword1 + 2^32-1;
>> > }
>> > eventDate = (dword2*2^32 + dword1)/1000
>> > repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> > exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> > loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> > } # end while
>> > return(PLT)
>> > close(URL)
>> > }
>> >
>> > 
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread Bob Rudis
You should probably pick a forum — here or SO :
http://stackoverflow.com/questions/39547398/faster-reading-of-binary-files-in-r
: - vs cross-post to all of them.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN 
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman  wrote:
>
> > Your example was not reproducible.  Also how do you "break" out of the
> > "while" loop?
> >
> >
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want to do, not how you want to do it.
> >
> > On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau 
> > wrote:
> >
> > > Hello,
> > > the following function, which stores numeric values extracted from a
> > > binary file, into an R matrix, is very slow, especially when the said
> > file
> > > is several MB in size.
> > > Should I rewrite the function in inline C or in C/C++ using Rcpp? If
> the
> > > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
> > > newbie)?
> > > Many thanks.
> > > Best regards,
> > > phiroc
> > >
> > >
> > > -
> > >
> > > # inputPath is something like http://myintranet/getData?
> > > pathToFile=/usr/lib/xxx/yyy/data.bin  > > pathToFile=/usr/lib/xxx/yyy/data.bin>
> > >
> > > PLTreader <- function(inputPath){
> > > URL <- file(inputPath, "rb")
> > > PLT <- matrix(nrow=0, ncol=6)
> > > compteurDePrints = 0
> > > compteurDeLignes <- 0
> > > maxiPrints = 5
> > > displayData <- FALSE
> > > while (TRUE) {
> > > periodIndex <- readBin(URL, integer(), size=4, n=1,
> > > endian="little") # int (4 bytes)
> > > eventId <- readBin(URL, integer(), size=4, n=1,
> > > endian="little") # int (4 bytes)
> > > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
> > > n=1, endian="little") # int
> > > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
> > > n=1, endian="little") # int
> > > if (dword1 < 0) {
> > > dword1 = dword1 + 2^32-1;
> > > }
> > > eventDate = (dword2*2^32 + dword1)/1000
> > > repNum <- readBin(URL, integer(), size=2, n=1,
> > > endian="little") # short (2 bytes)
> > > exp <- readBin(URL, numeric(), size=4, n=1,
> > > endian="little") # float (4 bytes, strangely enough, would expect 8)
> > > loss <- readBin(URL, numeric(), size=4, n=1,
> > > endian="little") # float (4 bytes)
> > > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
> > > repNum, exp, loss))
> > > } # end while
> > > return(PLT)
> > > close(URL)
> > > }
> > >
> > > 
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread Ismail SEZEN
I noticed same issue but didnt care much :)

On Sat, Sep 17, 2016, 18:01 jim holtman  wrote:

> Your example was not reproducible.  Also how do you "break" out of the
> "while" loop?
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau 
> wrote:
>
> > Hello,
> > the following function, which stores numeric values extracted from a
> > binary file, into an R matrix, is very slow, especially when the said
> file
> > is several MB in size.
> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
> > newbie)?
> > Many thanks.
> > Best regards,
> > phiroc
> >
> >
> > -
> >
> > # inputPath is something like http://myintranet/getData?
> > pathToFile=/usr/lib/xxx/yyy/data.bin  > pathToFile=/usr/lib/xxx/yyy/data.bin>
> >
> > PLTreader <- function(inputPath){
> > URL <- file(inputPath, "rb")
> > PLT <- matrix(nrow=0, ncol=6)
> > compteurDePrints = 0
> > compteurDeLignes <- 0
> > maxiPrints = 5
> > displayData <- FALSE
> > while (TRUE) {
> > periodIndex <- readBin(URL, integer(), size=4, n=1,
> > endian="little") # int (4 bytes)
> > eventId <- readBin(URL, integer(), size=4, n=1,
> > endian="little") # int (4 bytes)
> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
> > n=1, endian="little") # int
> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
> > n=1, endian="little") # int
> > if (dword1 < 0) {
> > dword1 = dword1 + 2^32-1;
> > }
> > eventDate = (dword2*2^32 + dword1)/1000
> > repNum <- readBin(URL, integer(), size=2, n=1,
> > endian="little") # short (2 bytes)
> > exp <- readBin(URL, numeric(), size=4, n=1,
> > endian="little") # float (4 bytes, strangely enough, would expect 8)
> > loss <- readBin(URL, numeric(), size=4, n=1,
> > endian="little") # float (4 bytes)
> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
> > repNum, exp, loss))
> > } # end while
> > return(PLT)
> > close(URL)
> > }
> >
> > 
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
Your example was not reproducible.  Also how do you "break" out of the
"while" loop?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau 
wrote:

> Hello,
> the following function, which stores numeric values extracted from a
> binary file, into an R matrix, is very slow, especially when the said file
> is several MB in size.
> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
> latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
> newbie)?
> Many thanks.
> Best regards,
> phiroc
>
>
> -
>
> # inputPath is something like http://myintranet/getData?
> pathToFile=/usr/lib/xxx/yyy/data.bin  pathToFile=/usr/lib/xxx/yyy/data.bin>
>
> PLTreader <- function(inputPath){
> URL <- file(inputPath, "rb")
> PLT <- matrix(nrow=0, ncol=6)
> compteurDePrints = 0
> compteurDeLignes <- 0
> maxiPrints = 5
> displayData <- FALSE
> while (TRUE) {
> periodIndex <- readBin(URL, integer(), size=4, n=1,
> endian="little") # int (4 bytes)
> eventId <- readBin(URL, integer(), size=4, n=1,
> endian="little") # int (4 bytes)
> dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
> n=1, endian="little") # int
> dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
> n=1, endian="little") # int
> if (dword1 < 0) {
> dword1 = dword1 + 2^32-1;
> }
> eventDate = (dword2*2^32 + dword1)/1000
> repNum <- readBin(URL, integer(), size=2, n=1,
> endian="little") # short (2 bytes)
> exp <- readBin(URL, numeric(), size=4, n=1,
> endian="little") # float (4 bytes, strangely enough, would expect 8)
> loss <- readBin(URL, numeric(), size=4, n=1,
> endian="little") # float (4 bytes)
> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
> repNum, exp, loss))
> } # end while
> return(PLT)
> close(URL)
> }
>
> 
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread Jeff Newmiller
Appending to lists is only very slightly more efficient than incremental 
rbinding. Ideally you can figure out an upper bound for number of records, 
preallocate a data frame of that size, modify each element as you go in-place, 
and shrink the data frame once at the end as needed. If you cannot do that, you 
can append fixed size data frames and follow the same strategy in chunks with a 
single do.call/rbind at the end. 

Note that reproducible examples including example data often yield working 
code, while incomplete examples tend to yield handwaving descriptions like the 
above. 

I will note that any code placed after a return function is useless. I highly 
recommend avoiding the return function like the plague... use the 
expression-at-the-end-of-the-function method of returning.
-- 
Sent from my phone. Please excuse my brevity.

On September 17, 2016 7:10:05 AM PDT, Ismail SEZEN  
wrote:
>I suspect that rbind is responsible. Use list and append instead of
>rbind. At the end, combine elements of list by do.call(“rbind”, list).
>
>> On 17 Sep 2016, at 15:05, Philippe de Rochambeau 
>wrote:
>> 
>> Hello,
>> the following function, which stores numeric values extracted from a
>binary file, into an R matrix, is very slow, especially when the said
>file is several MB in size.
>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If
>the latter case is true, how do you « readBin »  in Rcpp (I’m a total
>Rcpp newbie)?
>> Many thanks.
>> Best regards,
>> phiroc
>> 
>> 
>> -
>> 
>> # inputPath is something like
>http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin
>
>> 
>> PLTreader <- function(inputPath){
>>  URL <- file(inputPath, "rb")
>>  PLT <- matrix(nrow=0, ncol=6)
>>  compteurDePrints = 0
>>  compteurDeLignes <- 0
>>  maxiPrints = 5
>>  displayData <- FALSE
>>  while (TRUE) {
>>  periodIndex <- readBin(URL, integer(), size=4, n=1,
>endian="little") # int (4 bytes)
>>  eventId <- readBin(URL, integer(), size=4, n=1, 
>> endian="little") #
>int (4 bytes)
>>  dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1,
>endian="little") # int
>>  dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1,
>endian="little") # int
>>  if (dword1 < 0) {
>>  dword1 = dword1 + 2^32-1;
>>  }
>>  eventDate = (dword2*2^32 + dword1)/1000
>>  repNum <- readBin(URL, integer(), size=2, n=1, endian="little") 
>> #
>short (2 bytes)
>>  exp <- readBin(URL, numeric(), size=4, n=1, endian="little") #
>float (4 bytes, strangely enough, would expect 8)
>>  loss <- readBin(URL, numeric(), size=4, n=1, endian="little") #
>float (4 bytes)
>>  PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, 
>> exp,
>loss))
>>  } # end while
>>  return(PLT)
>>  close(URL)
>> }
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread Ismail SEZEN
I suspect that rbind is responsible. Use list and append instead of rbind. At 
the end, combine elements of list by do.call(“rbind”, list).

> On 17 Sep 2016, at 15:05, Philippe de Rochambeau  wrote:
> 
> Hello,
> the following function, which stores numeric values extracted from a binary 
> file, into an R matrix, is very slow, especially when the said file is 
> several MB in size.
> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the 
> latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp 
> newbie)?
> Many thanks.
> Best regards,
> phiroc
> 
> 
> -
> 
> # inputPath is something like 
> http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin 
> 
> 
> PLTreader <- function(inputPath){
>   URL <- file(inputPath, "rb")
>   PLT <- matrix(nrow=0, ncol=6)
>   compteurDePrints = 0
>   compteurDeLignes <- 0
>   maxiPrints = 5
>   displayData <- FALSE
>   while (TRUE) {
>   periodIndex <- readBin(URL, integer(), size=4, n=1, 
> endian="little") # int (4 bytes)
>   eventId <- readBin(URL, integer(), size=4, n=1, 
> endian="little") # int (4 bytes)
>   dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, 
> endian="little") # int
>   dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, 
> endian="little") # int
>   if (dword1 < 0) {
>   dword1 = dword1 + 2^32-1;
>   }
>   eventDate = (dword2*2^32 + dword1)/1000
>   repNum <- readBin(URL, integer(), size=2, n=1, endian="little") 
> # short (2 bytes)
>   exp <- readBin(URL, numeric(), size=4, n=1, endian="little") # 
> float (4 bytes, strangely enough, would expect 8)
>   loss <- readBin(URL, numeric(), size=4, n=1, endian="little") # 
> float (4 bytes)
>   PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, 
> exp, loss))
>   } # end while
>   return(PLT)
>   close(URL)
> }
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Accelerating binRead

2016-09-17 Thread Philippe de Rochambeau
Hello,
the following function, which stores numeric values extracted from a binary 
file, into an R matrix, is very slow, especially when the said file is several 
MB in size.
Should I rewrite the function in inline C or in C/C++ using Rcpp? If the latter 
case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp newbie)?
Many thanks.
Best regards,
phiroc


-

# inputPath is something like 
http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin 


PLTreader <- function(inputPath){
URL <- file(inputPath, "rb")
PLT <- matrix(nrow=0, ncol=6)
compteurDePrints = 0
compteurDeLignes <- 0
maxiPrints = 5
displayData <- FALSE
while (TRUE) {
periodIndex <- readBin(URL, integer(), size=4, n=1, 
endian="little") # int (4 bytes)
eventId <- readBin(URL, integer(), size=4, n=1, 
endian="little") # int (4 bytes)
dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, 
endian="little") # int
dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, 
endian="little") # int
if (dword1 < 0) {
dword1 = dword1 + 2^32-1;
}
eventDate = (dword2*2^32 + dword1)/1000
repNum <- readBin(URL, integer(), size=2, n=1, endian="little") 
# short (2 bytes)
exp <- readBin(URL, numeric(), size=4, n=1, endian="little") # 
float (4 bytes, strangely enough, would expect 8)
loss <- readBin(URL, numeric(), size=4, n=1, endian="little") # 
float (4 bytes)
PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, 
exp, loss))
} # end while
return(PLT)
close(URL)
}


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.