Just confirming some stuff, since I’m not completely following:
Raster_record.rast column is of type raster correct? IF so ST_X and ST_Y won’t work since those are for geometry types. Also ST_Value(raster_record.rast, band_number), won’t work either since that expects as input a geometry or x,y on the raster you want the value you. I would think you would have gotten an error with that, which makes me feel I’m missing something critical. If you want to extract all the pixels in a raster, you’d do something like https://postgis.net/docs/RT_ST_PixelAsPoints.html SELECT pp.x, pp.y, pp.val, ST_X(pp.geom) AS lon, ST_Y(pp.geom) AS lat FROM raster_record, ST_PixelAsPoints(raster_record.rast, 1) AS pp From: postgis-users <postgis-users-boun...@lists.osgeo.org> On Behalf Of Manaswini Ganjam via postgis-users Sent: Wednesday, November 15, 2023 2:01 PM To: postgis-users@lists.osgeo.org Cc: Manaswini Ganjam <manu.gan...@gmail.com> Subject: [postgis-users] Extracting variable information from netcdf, imported as raster to a table Hi, I have been trying to download s3 cloud stored gridded climate data and generate tables with variables, lat, lon and timestamp (year, yearday). To achieve this I used raster2pgsql and imported multiple netcdf files into a database table. Question: How to achieve the extraction of variables using postgis? I tried using ST_value, ST_pixelaspoints but I was getting errors, mainly due to the format in which netcdfs are stored in the database (the error says can't load some characters like 00E30100082...), I even tried changing the datatype to float but still did not work. I mean it is probably not simple like selecting a variable from the netcdf. I have enclosed my sql query below: -- Iterate through all raster files in the table FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP -- Determine the year from the raster file name, assuming the format is 'prcp_03_1950.nc <http://prcp_03_1950.nc> ' SELECT regexp_replace(raster_record.filename, '.*_(\d{4})\.nc', '\1')::integer INTO year; -- Calculate the start date of the year year_start := (year || '-01-01')::date; -- Determine if the year is a leap year is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1 year')) = 7; -- Set the number of bands for the year (365 for non-leap years, 366 for leap years) FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365 END) LOOP -- Calculate the observation_time using the year and band number observation_time := year_start + (band_number - 1) * interval '1 day'; -- Extract X (lon) and Y (lat) coordinates from the raster SELECT ST_X(raster_record.rast) AS lon, ST_Y(raster_record.rast) AS lat INTO lon, lat; -- Insert the lat, lon, prcp, and observation_time into the extracted_values table INSERT INTO extracted_values (lat, lon, prcp, observation_time) VALUES (lat, lon, ST_Value(raster_record.rast, band_number), observation_time); -- Increment the counter counter := counter + 1; -- Commit the transaction periodically in batches IF counter % batch_size = 0 THEN COMMIT; END IF; END LOOP; END LOOP; The metadata for the two files is as follows: File from database: {'NC_GLOBAL#Conventions': 'CF-1.5', 'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17', 'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy( not_clipped_prcp.nc <http://not_clipped_prcp.nc> , ... )'} File before loading into the database: {'lat#units': 'degrees_north', 'lon#units': 'degrees_east', 'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the United States and southern Canada east of the Rocky Mountains, version 1.0, realization 1, 0.1x0.1 degree spatial resolution.', 'NETCDF_DIM_EXTRA': '{time}', 'NETCDF_DIM_time_DEF': '{366,4}', 'NETCDF_DIM_time_VALUES': '{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}', 'prcp#add_offset': '819.17499', 'prcp#long_name': 'daily precipitation accumulation', 'prcp#missing_value': '-32768', 'prcp#scale_factor': '0.025', 'prcp#units': 'mm', 'prcp#_FillValue': '-32768', 'time#units': 'days since 1952-1-1 0:0:0.0'} In case this information is useful: Previously I used python to extract variable information and generate a csv or table using this variable information, and the code is enclosed below. In the code I extracted variable values using lon = dataset.variables['lon'][:] and iterated for loops to write them all in csv. Python code: import netCDF4 as nc # Step 1: Read the NetCDF file filename = "/home/manaswini/prcp_03_1950.nc <http://prcp_03_1950.nc> " dataset = nc.Dataset(filename) dataset.set_auto_mask(False) dataset.set_auto_scale(True) lon = dataset.variables['lon'][:] lat = dataset.variables['lat'][:] time = dataset.variables['time'][:] prcp = dataset.variables['prcp'][:] import numpy as np import csv # csv_buffer csv_buffer = open('output.csv', 'w', newline='') csv_writer = csv.writer(csv_buffer) # Iterate through grid points and write to CSV buffer for i in enumerate(lon): for j in enumerate(lat): for k in enumerate(time): csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k], yearday[k]]) # Close the CSV buffer csv_buffer.close() Thank you, Manaswini Ganjam
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/postgis-users