Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Johannes Graumann
Hmmm ... that should do it, thanks. But how would one use this on a file 
without reading it into memory completely?

Joh


On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:
 Do you mean like regexpr() (on the same help page)?
 
 Depending on your locale, you might actually prefer the character
 offset: if you want to match in a MBCS and have byte offsets you will
 need to work a bit harder if useBytes=TRUE is not sufficient for you.
 
 On Wed, 28 Oct 2009, Johannes Graumann wrote:
  Hi,
 
  Is there any way of doing 'grep' ore something like it on the content of
  a text file and extract the byte positioning of the match in the file?
  I'm facing the need to access rather largish (600MB) XML files and would
  like to be able to index them ...
 
  Thanks for any help or flogging,
 
  Joh
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Prof Brian Ripley

On Mon, 2 Nov 2009, Johannes Graumann wrote:


Hmmm ... that should do it, thanks. But how would one use this on a file
without reading it into memory completely?


?file, ?readLines, ?readBin

will tell you about connections.


Joh


On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:

Do you mean like regexpr() (on the same help page)?

Depending on your locale, you might actually prefer the character
offset: if you want to match in a MBCS and have byte offsets you will
need to work a bit harder if useBytes=TRUE is not sufficient for you.

On Wed, 28 Oct 2009, Johannes Graumann wrote:

Hi,

Is there any way of doing 'grep' ore something like it on the content of
a text file and extract the byte positioning of the match in the file?
I'm facing the need to access rather largish (600MB) XML files and would
like to be able to index them ...

Thanks for any help or flogging,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.






--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Johannes Graumann
On Monday 02 November 2009 13:41:45 Prof Brian Ripley wrote:
 On Mon, 2 Nov 2009, Johannes Graumann wrote:
  Hmmm ... that should do it, thanks. But how would one use this on a file
  without reading it into memory completely?
 
 ?file, ?readLines, ?readBin
 
 will tell you about connections.
... all of which I only get to read by the line and a regexpr on that will not 
give me the absolute offset.
grep -buo on the unix command line is really fast for this. If I can't find 
the native R equivalent, I'm of a mind to do this via a sys call - ugly and 
not portable, but SOOO fast ... is it possible in R?

Joh

 
  Joh
 
  On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:
  Do you mean like regexpr() (on the same help page)?
 
  Depending on your locale, you might actually prefer the character
  offset: if you want to match in a MBCS and have byte offsets you will
  need to work a bit harder if useBytes=TRUE is not sufficient for you.
 
  On Wed, 28 Oct 2009, Johannes Graumann wrote:
  Hi,
 
  Is there any way of doing 'grep' ore something like it on the content
  of a text file and extract the byte positioning of the match in the
  file? I'm facing the need to access rather largish (600MB) XML files
  and would like to be able to index them ...
 
  Thanks for any help or flogging,
 
  Joh
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regex matching that gives byte offset?

2009-10-28 Thread Johannes Graumann
Hi,

Is there any way of doing 'grep' ore something like it on the content of a 
text file and extract the byte positioning of the match in the file? I'm 
facing the need to access rather largish (600MB) XML files and would like 
to be able to index them ...

Thanks for any help or flogging,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-10-28 Thread Prof Brian Ripley

Do you mean like regexpr() (on the same help page)?

Depending on your locale, you might actually prefer the character 
offset: if you want to match in a MBCS and have byte offsets you will 
need to work a bit harder if useBytes=TRUE is not sufficient for you.


On Wed, 28 Oct 2009, Johannes Graumann wrote:


Hi,

Is there any way of doing 'grep' ore something like it on the content of a
text file and extract the byte positioning of the match in the file? I'm
facing the need to access rather largish (600MB) XML files and would like
to be able to index them ...

Thanks for any help or flogging,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.