Joost knows far more about databases than I do, so I mostly commented on the workflow part.

On 2016-05-20 22:36, waltd...@waltdnes.org wrote:
   Yes, I did RTFM at https://wiki.gentoo.org/wiki/PostgreSQL/QuickStart
and that's part of my problem. <G>  I figured it would be a simple
search and replace "9.3" ==> "9.5" in the wiki, but...

1) The wiki recommends...
PG_INITDB_OPTS="--locale=en_US.UTF-8"

...but I get...

The database cluster will be initialized with locale "en_US.iso88591".
initdb: "en_US.UTF8" is not a valid server encoding name
"locale -a" returns...
C
POSIX
en_US
en_US.iso88591
en_US.utf8

2) The wiki says...
This time the focus is upon the files in the PGDATA directory,
/etc/postgresql-9.3 , instead with primary focus on the
postgresql.conf and pg_hba.conf files.
"ls /etc/postgresql-9.5/" returns...
postgresql.conf  psqlrc

but postgresql seems to want them in /var/lib instead...

mv: cannot stat '/var/lib/postgresql/9.5/data/pg_hba.conf': No such
file or directory
mv: cannot stat '/var/lib/postgresql/9.5/data/pg_ident.conf': No
such file or directory
mv: cannot stat '/var/lib/postgresql/9.5/data/postgresql.conf':
No such file or directory
   Can somebody please confirm the correct way to go?

I have never run postgresql on gentoo (hopefully soon :D), but on Debian-derived distros and RPM-based distros, PGDATA is always somewhere in /var. /etc seems wrong.


   Why I want postgresql... I've been keeping a bunch of data in a
spreadsheet, and it's gotten too large.  The spreadsheet locks up my
system when I try to update it.  I've used "top" and watched as
gnumeric's memory consumption grows to eat all available ram.  It locks
up the system so I can't even ssh in.  This is on an X86_64 with 8 gigs
of RAM!  Fortunately, "magic-sysrq" allows a relatively clean shutdown.
While we're at it, is there a way for gnumeric to pull in data directly
from postgresql?  ODBC?  I'm aware of copying from postgresql to a CSV
file and importing that, but it's rather clunky.

`equery use gnumeric' gives the `libgda' flag, which should pull in database support. I've never used it, so I don't know whether or not it works/how well it works. What is in this spreadsheet? If it is financial stuff, you can use Gnucash, which supports using a database as a backend.


   My main problem is that columns of several thousand rows are functions
based on other columns of several thousand rows.  For the time-being,
I've split up the spreadsheet into a few pieces, but a database is the
best solution.  If I could run the calculations in the database, and
pull in the final results as static numbers for graphing, that would
greatly reduce the strain on the spreadsheet.  Or is it possible to
graph directly from postgresql?

Here are my recommendations, in order of "least code" to "most code" (I don't think postgresql supports graphing):

1. Write some sql scripts that compute the data you need and output CSV, then import to Gnumeric and do the plots. 2. Write python script(s) that run SQL commands and plot the data with matplotlib. 3. Write a webapp so you don't have to run scripts by hand - the plots are generated by opening a web page.

Depending on how much automation you want vs. how much time you want to spend writing/debugging code, hopefully one of those helps. I help researchers use a HPC cluster; some are very savvy programmers, some are not. For working on "big data" projects, some will throw raw data into a Hadoop cluster and happily do all their work using Hadoop, while some will put in raw data, clean it up, and then pull it out and use MATLAB, stata, R, etc., so you just need to find the workflow that works best for you. I personally would choose option 3, as it involves the least amount of running scripts over and over, but to each his own.

I have actual free time now (done with school, finally), so I might be able to help prototype if you would like as well.

Alec

Reply via email to