Nils, Really great... ? Thanks for sharing!
Salud! 2016-06-01 6:09 GMT-03:00 Nils Hempelmann <info at nilshempelmann.de>: > Hi Juan et al > > Thanks a lot for triggering this discussion. > I am currently working on a Web processing service ( > http://birdhouse.readthedocs.io/en/latest/) including a species > distribution model based on the GBIF data (and climate model data). A good > connection to GBIF database is still missing and all hints were quite > useful!! > > If you want to share code: > > https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py > > Merci > Nils > > On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote: > > Hi Tim, > > Thank you! specially for the DwC-A hint. > > The cells are by default in decimal degrees, (wgs84 ) but the functions > for generating them are general enough to use any projection supported by > gdal using postgis. It could be done "on the fly" or stored on the server > side, > > I was thinking (day dreaming) in a standard way for coding unique but > universal grids (similar to geohash or open location code), but didn't find > something fast and ready. Maybe later :) > > I only use Open Source Software, Python, Django, GDAL, Numpy, Postgis, > Conda, Py2Neo, ete2 among others. > > Currently I don't have an official release and the project is quite > inmature, unstable as well as the installation could be non trivial. I'm > fixing all these issues but will take some time,sorry for this. > > The github repository is: > > https://github.com/molgor/biospytial.git > > An there's a very old documentation here: > > http://test.holobio.me/modules/gbif_taxonomy_class.html > > Please feel free to follow! > > > Best wishes > > > Juan > > P.s. The functions for generating the grid are in: biospytial/SQL_functions > > > > > > On 31/05/16 19:47, Tim Robertson wrote: > > Thanks Juan > > You're quite right - you need the DwC-A download format to get those IDs. > > Are the cells decimal degrees, and then partitioned into smaller units, or > equal area cells or maybe UTM grids or something else perhaps? I am just > curious. > > Are you developing this as OSS? I'd like to follow progress if possible? > > Thanks, > Tim, > > On 31 May 2016, at 20:31, Juan M. Escamilla Molgora < > <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk> > wrote: > > Hi Tim, > > The grid is made by selecting a square area and divide it in nxn > subsquares which form a partition on the bigger square. > > Each grid is a table in postgis and there's a mapping between this table > to a django model (class). > > The class constructor have attributes: id, cell and neighbours (next > release). > > The cell is a polygon (square) and with geodjango inherits the properties > of the osgeo module for polygons. > > I've tried to use the CSV data (downloaded as a CSV request ) but I > couldn't find a way to obtain the global id's for each taxonomic level > (idspecies, idgenus, idfamily, etc). > > Do you know a way for obtaining these fields? > > > Thank you for your email and best wishes, > > > Juan > > On 31/05/16 19:03, Tim Robertson wrote: > > Hi Juan > > That sounds like a fun project! > > Can you please describe your grid / cells? > > Most likely your best bet will be to use the download API (as CSV data) > and ingest that. The other APIs will likely hit limits (e.g. You can't page > through indefinitely). > > Thanks, > Tim > > On 31 May 2016, at 18:55, Juan M. Escamilla Molgora < > <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk> > wrote: > > Dear all, > > > Thank you very much for your valuable feedback! > > > I'll explain a bit what I'm doing just to clarify, sorry if this spam to > some. > > > I want to build a model for species assemblages based on co-occurrence of > taxa within an arbitrary area. I'm building a 2D lattice in which for each > cell I'm collapsing the data into a taxonomic tree (the occurrences). For > doing this I need first to obtain the data from the gbif api and later, > based on the ids (or names) of each taxonomic level (from kingdom to > occurrence) build a tree coupled to each cell. > > > The implementation is done with postgresql (postgis) for storing the raw > gbif data and neo4j for storing the relation > > "Being a member of the [ specie, genus, family,,,] [name/id]" The idea is > to include data from different sources similar to the project Matthew and > Jennifer had mentioned (which I'm very interested and like to hear more) > and traverse the network looking for significant merged information. > > > One of the immediate problems I've found is to import big chunks of the > gbif data into my specification. Thanks to this thread I've found the tools > that are the most used by the community (pygbif,rgbif, and > python-dwca-reader). I was using urlib2 and things like that. > > I'll be happy to share any code or ideas with the people interested. > > > Btw, I've checked the tinkerpop project which uses the Gremlin traversal > language as independent from the DBMS. > > Perhaps it's possible to use it with spark and Guoda as well? > > > > Does GOuda is working now? > > > Best wishes > > > Juan. > > > > > > > > On 31/05/16 17:02, Collins, Matthew wrote: > > Jorrit pointed out this thread to us at iDigBio. Downloading and importing > data into a relational database will work great, especially if as Jan said > you can cut the data size down to a reasonable amount. > > > Another approach we've been working on in a collaboration called GUODA > [1] is to build an Apache Spark environment with pre-formatted data frames > with common data sets in them for researchers to use. This approach would > offer a remote service where you could write arbitrary Spark code, > probably in Jupyter notebooks, to iterate over data. Spark does a lot of > cool stuff including GraphX which might be of interest. This is definitely > pre-alpha at this point and if anyone is interested, I'd like to hear your > thoughts. I'll also be at SPNHC talking about this. > > > One thing we've found in working on this is that importing data into a > structured data format isn't always easy. If you only want a few columns, > it'll be fine. But getting the data typing, format standardization, and > column name syntax of the whole width of an iDigBio record right requires > some code. I looked to see if EcoData Retriever [2] had a GBIF data > source and they have an eBird one that perhaps you might find useful as a > starting point if you wanted to try to use someone else's code to download > and import data. > > > For other data structures like BHL, we're kind of making stuff up since > we're packaging a relational structure and not something nearly as flat as > GBIF and DWC stuff. > > > [1] http://guoda.bio/? > > [2] http://www.ecodataretriever.org/ > > > Matthew Collins > Technical Operations Manager > Advanced Computing and Information Systems Lab, ECE > University of Florida > 352-392-5414 <callto:352-392-5414> > ------------------------------ > *From:* jorrit poelen <jhpoelen at xs4all.nl> <jhpoelen at xs4all.nl> > *Sent:* Monday, May 30, 2016 11:16 AM > *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer > *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based driver for > this API ? > > Hey y?all: > > Interesting request below on the GBIF mailing list - sounds like a perfect > fit for the GUODA use cases. > > Would it be too early to jump onto this thread and share our > efforts/vision? > > thx, > -jorrit > > Begin forwarded message: > > *From: *Jan Legind < <jlegind at gbif.org>jlegind at gbif.org> > *Subject: **Re: [API-users] Is there any NEO4J or graph-based driver for > this API ?* > *Date: *May 30, 2016 at 5:48:51 AM PDT > *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla Molgora" < > <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk> > *Cc: *"api-users at lists.gbif.org" <api-users at lists.gbif.org> > > Dear Juan, > > Unfortunately we have no tool for creating these kind of SQL like queries > to the portal. I am sure you are aware that the filters in the occurrence > search pages can be applied in combination in numerous ways. The API can go > even further in this regard[1], but it not well suited for retrieving > occurrence records since there is a 200.000 records ceiling making it unfit > for species exceeding this number. > > There is going be updates to the pygbif package[2] in the near future that > will enable you to launch user downloads programmatically where a whole > list of different species can be used as a query parameter as well as > adding polygons.[3] > > In the meantime, Mauro?s suggestion is excellent. If you can narrow your > search down until it returns a manageable download (say less than 100 > million records), importing this into a database should be doable. From > there, you can refine using SQL queries. > > Best, > Jan K. Legind, GBIF Data manager > > [1] <http://www.gbif.org/developer/occurrence#search> > http://www.gbif.org/developer/occurrence#search > [2] <https://github.com/sckott/pygbif>https://github.com/sckott/pygbif > [3] <https://github.com/jlegind/GBIF-downloads> > https://github.com/jlegind/GBIF-downloads > > *From:* API-users [mailto:api-users-bounces at lists.gbif.org > <api-users-bounces at lists.gbif.org>] *On Behalf Of *Mauro Cavalcanti > *Sent:* 30. maj 2016 14:06 > *To:* Juan M. Escamilla Molgora > *Cc:* <api-users at lists.gbif.org>api-users at lists.gbif.org > *Subject:* Re: [API-users] Is there any NEO4J or graph-based driver for > this API ? > > > Hi, > > One solution I have successfully adopted for this is to download the > records (either "manually" via browser or, yet better, using a Python > script using the fine pygbif library), storing them into a MySQL or SQLite > database and then perform the relational queries. I can provide examples if > you are interested. > Best regards, > > 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora < > <j.escamillamolgora at lancaster.ac.uk>j.escamillamolgora at lancaster.ac.uk>: > Hola, > > Is there any API for making relational queries like taxonomy, location or > timestamp? > > Thank you and best wishes > > Juan > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > > > -- > Dr. Mauro J. Cavalcanti > E-mail: <maurobio at gmail.com>maurobio at gmail.com > Web: <http://sites.google.com/site/maurobio> > http://sites.google.com/site/maurobio > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > > > > _______________________________________________ > API-users mailing listAPI-users at > lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users > > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > > > > > _______________________________________________ > API-users mailing listAPI-users at > lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users > > > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > -- Dr. Mauro J. Cavalcanti E-mail: maurobio at gmail.com Web: http://sites.google.com/site/maurobio -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/df2166d4/attachment-0001.html>