Re: [R] The Future of R | API to Public Databases

2014-05-21 Thread antagomir
Just a note on this earlier thread: R package collections for public APIs
seem to be now emerging and thriving in various disciplines: 

- Bioinformatics the Bioconductor collection contains many API packages
- rOpenSci: R tools for open science-related APIs 
- rOpenHealth: R tools for open healthcare-related APIs
- rOpenGov: R tools for open government data and computational social
science (disclaimer: I am one of the main developers for this one)

These community projects and aim to fill the gap discussed in this thread.
The APIs and needs are many, and best tackled with community-driven package
collections written by the actual users. 

Have a look at those projects - your contributions to any of them is
certainly welcome.

best,
Leo Lahti





--
View this message in context: 
http://r.789695.n4.nabble.com/The-Future-of-R-API-to-Public-Databases-tp4293526p4690960.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-15 Thread Benjamin Weber
Yes, R-devel would be the right mailing list for this discussion.
As some people pointed out, the problem definition is vague. This was
to encourage people to share their *different* perceptions about the
problem and to get to some extent a consensus.

My starting point has come from my mind, consequently I must be an
egocentric person. I agree on that.
There are a lot of other egocentric persons who download R and just
want to have their result ASAP. That's reality.
The same is given with each and every special interest group (where
each and every member has a special interest).
Everyone cares only about his needs. That is the systematic issue we
have to overcome by working together to simplify everyone's individual
situation. Finally we should reach a win-win situation for all. That
is my notion.

What I wanted to point out was more or less about the process of a
statistical research:

1. Set up your research objective
2. Find the right data (time intensive)
3. Download the right format
4. Import it, make it compatible, clean it up
5. Work with it
6. Get your results

The more integrative your research objective is set up, the more time
you spent on parts 1 to 3. And points 1 to 3 make up most of the time
in most cases. Some people will resign due to lack of time or just due
to lack of accessibility of data.

I highly appreciate that a lot of people participated in this
discussion, the publishers itself address the problem nowadays (just
take a look at [1]) and some people are working on it in the R world
(i.e. TSdbi).

Reality is better than I initially perceived it. But is is not as it should be.

Benjamin


[1] http://sdmx.org/wp-content/uploads/2011/10/SDMX-Action-Plan-2011_2015.pdf

On 15 January 2012 13:15, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On 14/01/2012 18:51, Joshua Wiley wrote:

 I have been following this thread, but there are many aspects of it
 which are unclear to me.  Who are the publishers?  Who are the users?
 What is the problem?  I have a vauge sense for some of these, but it
 seems to me like one valuable starting place would be creating a
 document that clarifies everything.  It is easier to tackle a concrete
 problem (e.g., agree on a standard numerical representation of dates
 and times a la ISO 8601) than something diffuse (e.g., information
 overload).


 Let alone something as vague as 'the future of R' (for which the R-devel
 list is the appropriate one).  I believe the original poster is being
 egocentric: as someone said earlier, she has never had need of this concept,
 and I believe that is true of the vast majority of R users.

 The development of R per se is primarily driven by the needs of the core
 developers and those around them.  Other R communities have sent up their
 own special-interest groups and sets of packages, and that would seem the
 way forward here.


 Good luck,

 Josh

 On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weberm...@bwe.im  wrote:

 Mike

 We see that the publishers are aware of the problem. They don't think
 that the raw data is the usable for the user. Consequently they
 recognizing this fact with the proprietary formats. Yes, they resign
 in the information overload. That's pathetic.

 It is not a question of *which* data format, it is a question about
 the general concept. Where do publisher and user meet? There has to be
 one *defined* point which all parties agree on. I disagree with your
 statement that the publisher should just publish csv or cook his own
 API. That leads to fragmentation and inaccessibility of data. We want
 data to be accessible.

 A more pragmatic approach is needed to revolutionize the way we go
 about raw data.

 Benjamin

 On 14 January 2012 22:17, Mike Marchywkamarchy...@hotmail.com  wrote:








 LOL, I remember posting about this in the past. The US gov agencies vary
 but mostare quite good. The big problem appears to be people who push
 proprietary orcommercial standards for which only one effective source
 exists. Some formats,like Excel and PDF come to mind and there is a
 disturbing trend towards theiradoption in some places where raw data is
 needed by many. The best thing to do is contact the informationprovider and
 let them know you want raw data, not images or stuff that worksin limited
 commercial software packages. Often data sources are valuable andthe 
 revenue
 model impacts availability.

 If you are just arguing over different open formats,  it is usually easy
 for someone towrite some conversion code and publish it- CSV to JSON would
 not be a problem for example. Data of course are quite variable and there 
 is
 nothingwrong with giving provider his choice.

 

 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some

Re: [R] The Future of R | API to Public Databases

2012-01-15 Thread Jason Edgecombe
:23 -0500
From: ja...@rampaginggeek.com
To: r-help@r-project.org
Subject: Re: [R] The Future of R | API to Public Databases

Web services are only part of the problem. In essence, there are at
least two facets:
1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes
useful
when both are included. I think #2 is the harder problem to address.
Software can usually be written to handle #1 by making a useful
abstraction layer. #2 means that data has consistent names and
meanings,
and this requires people to agree on common definitions and a common
naming convention.

RDF (Resource Description Framework) and its related technologies
(SPARQL, OWL, etc) are one of the many attempts to try to address this.
While this effort would benefit R, I think it's best if it's part of a
larger effort.

Services such as DBpedia and Freebase are trying to unify many data
sets
using RDF.

The task view and package ideas a great ideas. I'm just adding another
perspective.

Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web
services, so it would only require writing once. data.gov is the worst
example, they spun the own, weak service.

There is a lot of environmental data available through OPenDAP, and
that is supported in the ncdf4 package. My own group has a service called
ERDDAP that is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R (and matlab) scripts that automate the extract for
certain cases, see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector (EDC) that
provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to
subset data that is served by OPeNDAP, ERDDAP, certain Sensor Observation
Service (SOS) servers, and have it read directly into R. It is freely
available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized
(OPeNDAP, SOS) or is easy to implement (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:


Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools
to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format.
I
could not find a package on CRAN which offers exactly this
fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing
with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital
discussion.

Benjamin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S.
Government or NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.
From those who have been given much, much will be expected
the arc of the moral universe is long, but it bends toward justice
-MLK Jr.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Benjamin Weber
Spencer

I highly appreciate your input. What we need is a standard for
statistics. That may reinvent the way how we see data.
The recent crisis is the best proof that we are lost in our own
generated information overload. The traditional approach is not
working anymore.

Finding the right members for the initial committee would be the
hardest but most important part.

Another point is that I am only a student of 21 years which has
limited financial capabilities with respect to what I can commit to
such kind of a work.
But I have my motivation which is the *real* engine to advance an
idea. I am open to work in my spare time on it. Over time I would
become an expert in my own field, that is implicit in such a decision.
I don't have any background of a statistician but know what the
relevance of data is.
It may be a solution that a fresher gives a new perspective. Starting
from scratch is at some point beneficial.
It will be even harder for a person like me to convince the
experienced professionals to overcome their own conventional schemes
and procedures. Because my approach would pay not respect to the
established ones. Why the hell should I know it just better than the
experts? I respect single solutions; they might work in a specific
situation but they make it impossible to put everything together into
a big picture which is finally required.

I am really interested in leading the initiative of such a new
standard. My problem is how to start.

Would a scientific paper which proposes the development of a standard,
be a starting point?

Benjamin

On 14 January 2012 08:19, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:
      A traditional way to exit a chaotic situation as you describe is to try
 to establish a standards committee, invite participation from suppliers and
 users of whatever (data in this case), apply for registration with the
 International Standards Organization, and organize meetings, draft and
 circulate a proposed standard, etc.  A statistician who had published maybe
 100 papers and 3 books told me that his work on ISO 9000 (I think) made a
 larger contribution to humanity than anything else he had done.  Work on
 standards is one of the most boring, tedious activities I can imagine -- and
 can potentially be the most impactful thing one does in this life:  If you
 have an ISO standard number for something, people who are starting something
 new may find it and follow it.  People who are working to upgrade something
 may tell their management, Let's follow this standard.  Customers
 sometimes ask their suppliers, If you follow the standard, you might get
 more customers.


      I think you could get support for such a standard effort from the
 American Association for the Advancement of Science, the American Economics
 Association, the American Statistical Association, and many other
 organizations, including many on-line science journals that today pressure
 authors of papers to put the data behind their published paper in the public
 domain, downloadable from their web site, etc.


      IMHO.
      Spencer


 On 1/13/2012 3:39 PM, Benjamin Weber wrote:

 The whole issue is related to the mismatch of (1) the publisher of the
 data and (2) the user at the rendezvous point.
 Both the publisher and the user don't know anything about the
 rendezvous point. Both want to meet but don't meet in reality.
 The user wastes time to find the rendezvous point defined by the
 publisher.
 The publisher assumes any rendezvous point. As per the number of
 publishers, the variety of the fields and the flavor of each expert,
 we end up in today's data world. Everyone has to waste his precious
 time to find out the rendezvous point. Only experts do know in which
 corner to focus their search on - but even they need their time to
 find what they want.
 However, each expert (of each profession) believes that his approach
 is the best one in the world.
 Finally we have a state of total confusion, where only experts can
 handle the information and non-experts can not even access the data
 without diving fully into the flood of data and their specialities.
 That's my point: Data is not accessible.

 The discussion should follow a strategical approach:
 - Is the classical csv file (in all its varieties) the simplest and best
 way?
 - Isn't it the responsibility of the R community to recommend
 standards for different kinds of data?
 With the existence of this rendezvous point the publisher would know a
 specific point which is favorable from the user's point of view. That
 is missing.
 Only a rendezvous point defined by the community can be a 'known'
 rendezvous point for all stakeholders, globally.

 I do believe that the publisher's greatest interest is data
 accessibility. Where is the toolkit we provide them to enable them to
 serve us the data exactly as we want it? No, we just try to build even
 more packages to be lost in the noise of information.

 I disagree with a proposed solution to 

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Jason Edgecombe
Web services are only part of the problem. In essence, there are at 
least two facets:

1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful 
when both are included. I think #2 is the harder problem to address. 
Software can usually be written to handle #1 by making a useful 
abstraction layer. #2 means that data has consistent names and meanings, 
and this requires people to agree on common definitions and a common 
naming convention.


RDF (Resource Description Framework) and its related technologies 
(SPARQL, OWL, etc) are one of the many attempts to try to address this. 
While this effort would benefit R, I think it's best if it's part of a 
larger effort.


Services such as DBpedia and Freebase are trying to unify many data sets 
using RDF.


The task view and package ideas a great ideas. I'm just adding another 
perspective.


Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once.  data.gov is the worst example, they 
spun the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package.  My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R  (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector  (EDC) that  
provides a GUI from with R  (and ArcGIS, Matlab and Excel) that allows you to 
subset  data that is served by OPeNDAP, ERDDAP, certain Sensor Observation 
Service (SOS) servers,  and have it read directly into R.  It is freely 
available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized  (OPeNDAP, 
SOS) or is easy to implement  (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:


Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S. Government or 
NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.
From those who have been given much, much will be expected
the arc of the moral universe is long, but it bends toward justice -MLK Jr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Mike Marchywka







LOL, I remember posting about this in the past. The US gov agencies vary but 
mostare quite good. The big problem appears to be people who push proprietary 
orcommercial standards for which only one effective source exists. Some 
formats,like Excel and PDF come to mind and there is a disturbing trend towards 
theiradoption in some places where raw data is needed by many. The best thing 
to do is contact the informationprovider and let them know you want raw data, 
not images or stuff that worksin limited commercial software packages. Often 
data sources are valuable andthe revenue model impacts availability. 

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice. 


 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that is 
  supported in the ncdf4 package. My own group has a service called ERDDAP 
  that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It is 
  freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with R.
 
  How can we handle the flow of information noise? R has to give an
  answer to that with an extensive API to public databases.
 
  I would love your comments and ideas as a contribution in a vital 
  discussion.
 
  Benjamin
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  **
  The contents of this message do not reflect any position of the U.S. 
  Government or NOAA.
  **
  Roy Mendelssohn
  Supervisory Operations Research Analyst
  NOAA/NMFS
  Environmental Research Division
  Southwest Fisheries Science Center

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Benjamin Weber
Mike

We see that the publishers are aware of the problem. They don't think
that the raw data is the usable for the user. Consequently they
recognizing this fact with the proprietary formats. Yes, they resign
in the information overload. That's pathetic.

It is not a question of *which* data format, it is a question about
the general concept. Where do publisher and user meet? There has to be
one *defined* point which all parties agree on. I disagree with your
statement that the publisher should just publish csv or cook his own
API. That leads to fragmentation and inaccessibility of data. We want
data to be accessible.

A more pragmatic approach is needed to revolutionize the way we go
about raw data.

Benjamin

On 14 January 2012 22:17, Mike Marchywka marchy...@hotmail.com wrote:







 LOL, I remember posting about this in the past. The US gov agencies vary but 
 mostare quite good. The big problem appears to be people who push proprietary 
 orcommercial standards for which only one effective source exists. Some 
 formats,like Excel and PDF come to mind and there is a disturbing trend 
 towards theiradoption in some places where raw data is needed by many. The 
 best thing to do is contact the informationprovider and let them know you 
 want raw data, not images or stuff that worksin limited commercial software 
 packages. Often data sources are valuable andthe revenue model impacts 
 availability.

 If you are just arguing over different open formats,  it is usually easy for 
 someone towrite some conversion code and publish it- CSV to JSON would not be 
 a problem for example. Data of course are quite variable and there is 
 nothingwrong with giving provider his choice.

 
 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that 
  is supported in the ncdf4 package. My own group has a service called 
  ERDDAP that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It is 
  freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with 
  R

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Joshua Wiley
I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).

Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weber m...@bwe.im wrote:
 Mike

 We see that the publishers are aware of the problem. They don't think
 that the raw data is the usable for the user. Consequently they
 recognizing this fact with the proprietary formats. Yes, they resign
 in the information overload. That's pathetic.

 It is not a question of *which* data format, it is a question about
 the general concept. Where do publisher and user meet? There has to be
 one *defined* point which all parties agree on. I disagree with your
 statement that the publisher should just publish csv or cook his own
 API. That leads to fragmentation and inaccessibility of data. We want
 data to be accessible.

 A more pragmatic approach is needed to revolutionize the way we go
 about raw data.

 Benjamin

 On 14 January 2012 22:17, Mike Marchywka marchy...@hotmail.com wrote:







 LOL, I remember posting about this in the past. The US gov agencies vary but 
 mostare quite good. The big problem appears to be people who push 
 proprietary orcommercial standards for which only one effective source 
 exists. Some formats,like Excel and PDF come to mind and there is a 
 disturbing trend towards theiradoption in some places where raw data is 
 needed by many. The best thing to do is contact the informationprovider and 
 let them know you want raw data, not images or stuff that worksin limited 
 commercial software packages. Often data sources are valuable andthe revenue 
 model impacts availability.

 If you are just arguing over different open formats,  it is usually easy for 
 someone towrite some conversion code and publish it- CSV to JSON would not 
 be a problem for example. Data of course are quite variable and there is 
 nothingwrong with giving provider his choice.

 
 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that 
  is supported in the ncdf4 package. My own group has a service called 
  ERDDAP that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It 
  is freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Paul Gilbert
The situation for this kind of interface is much more advanced (for 
economic time series data) than has been suggested in other postings. 
Several of the organizations you mention support SDMX and I believe 
there is a working R interface to SDMX which has not yet been made 
public. A more complete list of organizations that I think already have 
working server side support for SDMX is: the OECD, Eurostat, the ECB, 
the IMF, the UN, the BIS, the Federal Reserve Board, the World Bank, the 
Italian Statistics agency, and to a small extent by the Bank of Canada. 
 I have a working API to several time series databases (TS* packages on 
CRAN), and a partially working interface to SDMX, but have postponed 
further development of that in the hope that the already working code 
will be made available. Please see http://tsdbi.r-forge.r-project.org/ 
for more details. I would, of course, be happy to have other developers 
involved in this project. If you think you can contribute then see 
r-forge.r-project.org for details on how to join projects.


Paul

On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sat, 14 Jan 2012 02:44:07 +0530
From: Benjamin Weberm...@bwe.im
To:r-help@r-project.org
Subject: [R] The Future of R | API to Public Databases
Message-ID:
cany9q8k+zyvrkjjgbjp+jtnyaw15gqkocivyvpgwgyqa9dl...@mail.gmail.com
Content-Type: text/plain; charset=UTF-8

Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Jason Edgecombe
...@rampaginggeek.com
To: r-help@r-project.org
Subject: Re: [R] The Future of R | API to Public Databases

Web services are only part of the problem. In essence, there are at
least two facets:
1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful
when both are included. I think #2 is the harder problem to address.
Software can usually be written to handle #1 by making a useful
abstraction layer. #2 means that data has consistent names and meanings,
and this requires people to agree on common definitions and a common
naming convention.

RDF (Resource Description Framework) and its related technologies
(SPARQL, OWL, etc) are one of the many attempts to try to address this.
While this effort would benefit R, I think it's best if it's part of a
larger effort.

Services such as DBpedia and Freebase are trying to unify many data sets
using RDF.

The task view and package ideas a great ideas. I'm just adding another
perspective.

Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once. data.gov is the worst example, they spun 
the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package. My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector (EDC) that provides 
a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to subset data 
that is served by OPeNDAP, ERDDAP, certain Sensor Observation Service (SOS) 
servers, and have it read directly into R. It is freely available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized (OPeNDAP, 
SOS) or is easy to implement (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:


Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S. Government or 
NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.
From those who have been given much, much will be expected
the arc of the moral universe is long, but it bends toward justice -MLK Jr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Prof Brian Ripley

On 14/01/2012 18:51, Joshua Wiley wrote:

I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).


Let alone something as vague as 'the future of R' (for which the R-devel 
list is the appropriate one).  I believe the original poster is being 
egocentric: as someone said earlier, she has never had need of this 
concept, and I believe that is true of the vast majority of R users.


The development of R per se is primarily driven by the needs of the core 
developers and those around them.  Other R communities have sent up 
their own special-interest groups and sets of packages, and that would 
seem the way forward here.



Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weberm...@bwe.im  wrote:

Mike

We see that the publishers are aware of the problem. They don't think
that the raw data is the usable for the user. Consequently they
recognizing this fact with the proprietary formats. Yes, they resign
in the information overload. That's pathetic.

It is not a question of *which* data format, it is a question about
the general concept. Where do publisher and user meet? There has to be
one *defined* point which all parties agree on. I disagree with your
statement that the publisher should just publish csv or cook his own
API. That leads to fragmentation and inaccessibility of data. We want
data to be accessible.

A more pragmatic approach is needed to revolutionize the way we go
about raw data.

Benjamin

On 14 January 2012 22:17, Mike Marchywkamarchy...@hotmail.com  wrote:








LOL, I remember posting about this in the past. The US gov agencies vary but mostare 
quite good. The big problem appears to be people who push proprietary orcommercial 
standards for which only one effective source exists. Some formats,like Excel 
and PDF come to mind and there is a disturbing trend towards theiradoption in some places 
where raw data is needed by many. The best thing to do is contact the informationprovider 
and let them know you want raw data, not images or stuff that worksin limited commercial 
software packages. Often data sources are valuable andthe revenue model impacts 
availability.

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice.



Date: Sat, 14 Jan 2012 10:21:23 -0500
From: ja...@rampaginggeek.com
To: r-help@r-project.org
Subject: Re: [R] The Future of R | API to Public Databases

Web services are only part of the problem. In essence, there are at
least two facets:
1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful
when both are included. I think #2 is the harder problem to address.
Software can usually be written to handle #1 by making a useful
abstraction layer. #2 means that data has consistent names and meanings,
and this requires people to agree on common definitions and a common
naming convention.

RDF (Resource Description Framework) and its related technologies
(SPARQL, OWL, etc) are one of the many attempts to try to address this.
While this effort would benefit R, I think it's best if it's part of a
larger effort.

Services such as DBpedia and Freebase are trying to unify many data sets
using RDF.

The task view and package ideas a great ideas. I'm just adding another
perspective.

Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once. data.gov is the worst example, they spun 
the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package. My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector (EDC) that provides 
a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to subset data 
that is served by OPeNDAP, ERDDAP, certain Sensor Observation Service (SOS) 
servers, and have it read directly into R. It is freely available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because

[R] The Future of R | API to Public Databases

2012-01-13 Thread Benjamin Weber
Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Sarah Goslee
R is Open Source. You're welcome to write tools, and submit your
package to CRAN. I think some part of this has been done, based
on questions to the list asking about those parts.

Personally, I've been using S-Plus and then R for 18 years, and never
required data from any of them. Which doesn't make it not important,
but suggests that public databases aren't the be-all and end-all for
R use.

Sarah

On Fri, Jan 13, 2012 at 4:14 PM, Benjamin Weber m...@bwe.im wrote:
 Dear R Users -

 R is a wonderful software package. CRAN provides a variety of tools to
 work on your data. But R is not apt to utilize all the public
 databases in an efficient manner.
 I observed the most tedious part with R is searching and downloading
 the data from public databases and putting it into the right format. I
 could not find a package on CRAN which offers exactly this fundamental
 capability.
 Imagine R is the unified interface to access (and analyze) all public
 data in the easiest way possible. That would create a real impact,
 would put R a big leap forward and would enable us to see the world
 with different eyes.

 There is a lack of a direct connection to the API of these databases,
 to name a few:

 - Eurostat
 - OECD
 - IMF
 - Worldbank
 - UN
 - FAO
 - data.gov
 - ...

 The ease of access to the data is the key of information processing with R.

 How can we handle the flow of information noise? R has to give an
 answer to that with an extensive API to public databases.

 I would love your comments and ideas as a contribution in a vital discussion.

 Benjamin


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread MK
The WDI package on CRAN already provide access to the World Bank data 
through their API, we also have an inhouse package for FAOSTAT here at 
FAO but it is not mature enough to be released on CRAN yet.


Not sure about other international organisations but I do agree that it 
would be nice if there is a package which would make these data more 
readily to R users.



On 13/01/12 22:58, Sarah Goslee wrote:

R is Open Source. You're welcome to write tools, and submit your
package to CRAN. I think some part of this has been done, based
on questions to the list asking about those parts.

Personally, I've been using S-Plus and then R for 18 years, and never
required data from any of them. Which doesn't make it not important,
but suggests that public databases aren't the be-all and end-all for
R use.

Sarah

On Fri, Jan 13, 2012 at 4:14 PM, Benjamin Weberm...@bwe.im  wrote:

Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Roy Mendelssohn
HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once.  data.gov is the worst example, they 
spun the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package.  My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R  (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector  (EDC) that  
provides a GUI from with R  (and ArcGIS, Matlab and Excel) that allows you to 
subset  data that is served by OPeNDAP, ERDDAP, certain Sensor Observation 
Service (SOS) servers,  and have it read directly into R.  It is freely 
available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized  (OPeNDAP, 
SOS) or is easy to implement  (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:

 Dear R Users -
 
 R is a wonderful software package. CRAN provides a variety of tools to
 work on your data. But R is not apt to utilize all the public
 databases in an efficient manner.
 I observed the most tedious part with R is searching and downloading
 the data from public databases and putting it into the right format. I
 could not find a package on CRAN which offers exactly this fundamental
 capability.
 Imagine R is the unified interface to access (and analyze) all public
 data in the easiest way possible. That would create a real impact,
 would put R a big leap forward and would enable us to see the world
 with different eyes.
 
 There is a lack of a direct connection to the API of these databases,
 to name a few:
 
 - Eurostat
 - OECD
 - IMF
 - Worldbank
 - UN
 - FAO
 - data.gov
 - ...
 
 The ease of access to the data is the key of information processing with R.
 
 How can we handle the flow of information noise? R has to give an
 answer to that with an extensive API to public databases.
 
 I would love your comments and ideas as a contribution in a vital discussion.
 
 Benjamin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S. 
Government or NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.
From those who have been given much, much will be expected 
the arc of the moral universe is long, but it bends toward justice -MLK Jr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Thomas Adams
Sarah,

I agree; I think it would be the exception rather than the rule that one
would access these public data sources given the range of needs of R users,
who are generally analyzing their own data. Plus, IMO, it just is not very
difficult to reformat the data to a suitable format, if need be, to import
into R.

Tom

On Fri, Jan 13, 2012 at 4:58 PM, Sarah Goslee sarah.gos...@gmail.comwrote:

 R is Open Source. You're welcome to write tools, and submit your
 package to CRAN. I think some part of this has been done, based
 on questions to the list asking about those parts.

 Personally, I've been using S-Plus and then R for 18 years, and never
 required data from any of them. Which doesn't make it not important,
 but suggests that public databases aren't the be-all and end-all for
 R use.

 Sarah

 On Fri, Jan 13, 2012 at 4:14 PM, Benjamin Weber m...@bwe.im wrote:
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with
 R.
 
  How can we handle the flow of information noise? R has to give an
  answer to that with an extensive API to public databases.
 
  I would love your comments and ideas as a contribution in a vital
 discussion.
 
  Benjamin
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177
EMAIL:  thomas.ad...@noaa.gov
VOICE:  937-383-0528
FAX:937-383-0033

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread MacQueen, Don
It's a nice idea, but I wouldn't be optimistic about it happening:

Each of these public databases no doubt has its own more or less unique
API, and the people likely to know the API well enough to write R code to
access any particular database will be specialists in that field. They
likely won't know much if anything about other public databases. The
likelihood of a group forming to develop ** and maintain ** a single R
package to access the no-doubt huge variety of public databases strikes me
as small.

However, this looks like a great opportunity for a new CRAN Task View. The
task view would simply identify which packages connect to which public
databases. (sorry, I can't volunteer)

-Don

p.s.
I can mention openair as a package that has tools to access public
databases.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/13/12 2:12 PM, MK mkao006rm...@gmail.com wrote:

The WDI package on CRAN already provide access to the World Bank data
through their API, we also have an inhouse package for FAOSTAT here at
FAO but it is not mature enough to be released on CRAN yet.

Not sure about other international organisations but I do agree that it
would be nice if there is a package which would make these data more
readily to R users.


On 13/01/12 22:58, Sarah Goslee wrote:
 R is Open Source. You're welcome to write tools, and submit your
 package to CRAN. I think some part of this has been done, based
 on questions to the list asking about those parts.

 Personally, I've been using S-Plus and then R for 18 years, and never
 required data from any of them. Which doesn't make it not important,
 but suggests that public databases aren't the be-all and end-all for
 R use.

 Sarah

 On Fri, Jan 13, 2012 at 4:14 PM, Benjamin Weberm...@bwe.im  wrote:
 Dear R Users -

 R is a wonderful software package. CRAN provides a variety of tools to
 work on your data. But R is not apt to utilize all the public
 databases in an efficient manner.
 I observed the most tedious part with R is searching and downloading
 the data from public databases and putting it into the right format. I
 could not find a package on CRAN which offers exactly this fundamental
 capability.
 Imagine R is the unified interface to access (and analyze) all public
 data in the easiest way possible. That would create a real impact,
 would put R a big leap forward and would enable us to see the world
 with different eyes.

 There is a lack of a direct connection to the API of these databases,
 to name a few:

 - Eurostat
 - OECD
 - IMF
 - Worldbank
 - UN
 - FAO
 - data.gov
 - ...

 The ease of access to the data is the key of information processing
with R.

 How can we handle the flow of information noise? R has to give an
 answer to that with an extensive API to public databases.

 I would love your comments and ideas as a contribution in a vital
discussion.

 Benjamin


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Brian Diggs

On 1/13/2012 2:26 PM, MacQueen, Don wrote:

It's a nice idea, but I wouldn't be optimistic about it happening:

Each of these public databases no doubt has its own more or less unique
API, and the people likely to know the API well enough to write R code to
access any particular database will be specialists in that field. They
likely won't know much if anything about other public databases. The
likelihood of a group forming to develop ** and maintain ** a single R
package to access the no-doubt huge variety of public databases strikes me
as small.


I agree. The more reasonable model is a collection of packages, each of 
which can access a particular data source.



However, this looks like a great opportunity for a new CRAN Task View. The
task view would simply identify which packages connect to which public
databases. (sorry, I can't volunteer)


A CRAN Task View would be well suited for this. I have tagged these sort 
of packages on crantastic with the onlineData tag when I happen to 
notice one, but I have not made a concerted effort to find all packages. 
 A Task View would be even better.


http://crantastic.org/tags/onlineData


-Don

p.s.
I can mention openair as a package that has tools to access public
databases.


Tagged it.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Benjamin Weber
The whole issue is related to the mismatch of (1) the publisher of the
data and (2) the user at the rendezvous point.
Both the publisher and the user don't know anything about the
rendezvous point. Both want to meet but don't meet in reality.
The user wastes time to find the rendezvous point defined by the publisher.
The publisher assumes any rendezvous point. As per the number of
publishers, the variety of the fields and the flavor of each expert,
we end up in today's data world. Everyone has to waste his precious
time to find out the rendezvous point. Only experts do know in which
corner to focus their search on - but even they need their time to
find what they want.
However, each expert (of each profession) believes that his approach
is the best one in the world.
Finally we have a state of total confusion, where only experts can
handle the information and non-experts can not even access the data
without diving fully into the flood of data and their specialities.
That's my point: Data is not accessible.

The discussion should follow a strategical approach:
- Is the classical csv file (in all its varieties) the simplest and best way?
- Isn't it the responsibility of the R community to recommend
standards for different kinds of data?
With the existence of this rendezvous point the publisher would know a
specific point which is favorable from the user's point of view. That
is missing.
Only a rendezvous point defined by the community can be a 'known'
rendezvous point for all stakeholders, globally.

I do believe that the publisher's greatest interest is data
accessibility. Where is the toolkit we provide them to enable them to
serve us the data exactly as we want it? No, we just try to build even
more packages to be lost in the noise of information.

I disagree with a proposed solution to have a maintained package or a
bunch of packages which just combines connections to the existing
databases and keeping them up to date. It is a question of time when
the user will be lost there. Such an approach is neither feasible, nor
efficient.

We should just tell them where we would like to meet.

Benjamin

On 14 January 2012 04:58, Brian Diggs dig...@ohsu.edu wrote:
 On 1/13/2012 2:26 PM, MacQueen, Don wrote:

 It's a nice idea, but I wouldn't be optimistic about it happening:

 Each of these public databases no doubt has its own more or less unique
 API, and the people likely to know the API well enough to write R code to
 access any particular database will be specialists in that field. They
 likely won't know much if anything about other public databases. The
 likelihood of a group forming to develop ** and maintain ** a single R
 package to access the no-doubt huge variety of public databases strikes me
 as small.


 I agree. The more reasonable model is a collection of packages, each of
 which can access a particular data source.


 However, this looks like a great opportunity for a new CRAN Task View. The
 task view would simply identify which packages connect to which public
 databases. (sorry, I can't volunteer)


 A CRAN Task View would be well suited for this. I have tagged these sort of
 packages on crantastic with the onlineData tag when I happen to notice
 one, but I have not made a concerted effort to find all packages.  A Task
 View would be even better.

 http://crantastic.org/tags/onlineData


 -Don

 p.s.
 I can mention openair as a package that has tools to access public
 databases.


 Tagged it.

 --
 Brian S. Diggs, PhD
 Senior Research Associate, Department of Surgery
 Oregon Health  Science University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-13 Thread Spencer Graves
  A traditional way to exit a chaotic situation as you describe is 
to try to establish a standards committee, invite participation from 
suppliers and users of whatever (data in this case), apply for 
registration with the International Standards Organization, and organize 
meetings, draft and circulate a proposed standard, etc.  A statistician 
who had published maybe 100 papers and 3 books told me that his work on 
ISO 9000 (I think) made a larger contribution to humanity than anything 
else he had done.  Work on standards is one of the most boring, tedious 
activities I can imagine -- and can potentially be the most impactful 
thing one does in this life:  If you have an ISO standard number for 
something, people who are starting something new may find it and follow 
it.  People who are working to upgrade something may tell their 
management, Let's follow this standard.  Customers sometimes ask their 
suppliers, If you follow the standard, you might get more customers.



  I think you could get support for such a standard effort from the 
American Association for the Advancement of Science, the American 
Economics Association, the American Statistical Association, and many 
other organizations, including many on-line science journals that today 
pressure authors of papers to put the data behind their published paper 
in the public domain, downloadable from their web site, etc.



  IMHO.
  Spencer


On 1/13/2012 3:39 PM, Benjamin Weber wrote:

The whole issue is related to the mismatch of (1) the publisher of the
data and (2) the user at the rendezvous point.
Both the publisher and the user don't know anything about the
rendezvous point. Both want to meet but don't meet in reality.
The user wastes time to find the rendezvous point defined by the publisher.
The publisher assumes any rendezvous point. As per the number of
publishers, the variety of the fields and the flavor of each expert,
we end up in today's data world. Everyone has to waste his precious
time to find out the rendezvous point. Only experts do know in which
corner to focus their search on - but even they need their time to
find what they want.
However, each expert (of each profession) believes that his approach
is the best one in the world.
Finally we have a state of total confusion, where only experts can
handle the information and non-experts can not even access the data
without diving fully into the flood of data and their specialities.
That's my point: Data is not accessible.

The discussion should follow a strategical approach:
- Is the classical csv file (in all its varieties) the simplest and best way?
- Isn't it the responsibility of the R community to recommend
standards for different kinds of data?
With the existence of this rendezvous point the publisher would know a
specific point which is favorable from the user's point of view. That
is missing.
Only a rendezvous point defined by the community can be a 'known'
rendezvous point for all stakeholders, globally.

I do believe that the publisher's greatest interest is data
accessibility. Where is the toolkit we provide them to enable them to
serve us the data exactly as we want it? No, we just try to build even
more packages to be lost in the noise of information.

I disagree with a proposed solution to have a maintained package or a
bunch of packages which just combines connections to the existing
databases and keeping them up to date. It is a question of time when
the user will be lost there. Such an approach is neither feasible, nor
efficient.

We should just tell them where we would like to meet.

Benjamin

On 14 January 2012 04:58, Brian Diggsdig...@ohsu.edu  wrote:

On 1/13/2012 2:26 PM, MacQueen, Don wrote:

It's a nice idea, but I wouldn't be optimistic about it happening:

Each of these public databases no doubt has its own more or less unique
API, and the people likely to know the API well enough to write R code to
access any particular database will be specialists in that field. They
likely won't know much if anything about other public databases. The
likelihood of a group forming to develop ** and maintain ** a single R
package to access the no-doubt huge variety of public databases strikes me
as small.


I agree. The more reasonable model is a collection of packages, each of
which can access a particular data source.



However, this looks like a great opportunity for a new CRAN Task View. The
task view would simply identify which packages connect to which public
databases. (sorry, I can't volunteer)


A CRAN Task View would be well suited for this. I have tagged these sort of
packages on crantastic with the onlineData tag when I happen to notice
one, but I have not made a concerted effort to find all packages.  A Task
View would be even better.

http://crantastic.org/tags/onlineData



-Don

p.s.
I can mention openair as a package that has tools to access public
databases.


Tagged it.

--
Brian S. Diggs, PhD
Senior Research Associate, Department