On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote

> Why not make the alteration one step before -- in the CSV?  There are
> CSV abstraction tools like `q`, which gives you a SQL-like interface
> to a csv file. Or you could write a quick transformer in python,
> if you know the language a bit.

  I wrote a quick transformer in bash, because I know the language a
bit <G>.  There are six "hospital regions" in Ontario.  The CSV file
does all data for one "hospital region" April 1, 2020 to present,
followed by same for next hospital region, etc, etc, for all 6 regions.
My first step is to run dos2unix and sort by date, ending up with 6
consecutive lines per day, sorted by date.  Then a "while" loop reads
through the input CSV file.  Accumulators are added up for all 6 lines
and a line is written out for each date.  Because I'm doing bash
arithmetic, the data *MUST* be valid numbers, not ".".  So I *MUST* set
missing data to somthing like zero.  Adding "." to a number causes bash
to error out with a CSV file like so...

"2023-09-05","CENTRAL",5,2,152,6,3,1,1
"2023-09-06","CENTRAL",5,2,136,6,3,1,1
"2023-09-07","CENTRAL",7,1,158,8,2,1,1
"2023-09-08","CENTRAL",7,3,154,6,3,-1,0
"2023-09-09","CENTRAL",".",".",169,".",".",".","."
"2023-09-10","CENTRAL",".",".",169,".",".",".","."
"2023-09-11","CENTRAL",".",".",155,".",".",".","."
"2023-09-12","CENTRAL",".",".",147,".",".",".","."

  The magic incantation for CSV files is to set IFS to a comma like so...

oldifs="${IFS}"
IFS=","

  This allows parsing CSV files like so...

while read
do
   dataarray=(${REPLY})

...and, kaboom, you have a fully populated array from reading one line
of a CSV file.  The following "transformer" is my "parsehospicu" script
that summarizes the data to "hospsum.csv".  Note: I've deleted the
leading "shebang slash bin slash bash" line because the Gentoo mailing
list software doesn't seem to like "executable emails".

  I suppose I could have two different versions of the...

echo "${prevdate}...

...line inside of an if/then/else/fi construct.  It would depend on the
date being inside the "missing data range".

=======================================================================

# Strip out missing "." that screw up the script
sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > 
region_hospital_icu_covid_datax.csv
dos2unix -n region_hospital_icu_covid_datax.csv 
region_hospital_icu_covid_datay.csv
#
# tail skips headers at beginning of file
# sed deletes Row_ID, and strips out quotes
# Output goes to file /dev/shm/temp0.txt
tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > 
/dev/shm/temp0.txt
#
## Set up IFS for easier parsing
oldifs="${IFS}"
IFS=","
#
# Initialize previous line's date to enter loop smoothly
# expando to read first line
dataline=$( head -1 /dev/shm/temp0.txt )
dataarray=(${dataline})
prevdate="${dataarray[0]}"
#
# Zero out accumulators to enter loop smoothly
accum2=0
accum3=0
accum4=0
accum5=0
accum6=0
accum7=0
accum8=0
#
# Remove previous hospsum.csv and open a new one for writing
rm -rf hospsum.csv
exec 3>hospsum.csv
#
# Write header line to output file
echo 
"date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented"
 >&3
#
# Main loop
# Read the data from one line in /dev/shm/temp0.txt
while read
do
   dataarray=(${REPLY})
   if [ "${dataarray[0]}" = "${prevdate}" ]; then
#
# If this line's date is same as previous line's date, add amounts to 
accumulators.
      accum2=$(( ${accum2} + ${dataarray[2]} ))
      accum3=$(( ${accum3} + ${dataarray[3]} ))
      accum4=$(( ${accum4} + ${dataarray[4]} ))
      accum5=$(( ${accum5} + ${dataarray[5]} ))
      accum6=$(( ${accum6} + ${dataarray[6]} ))
      accum7=$(( ${accum7} + ${dataarray[7]} ))
      accum8=$(( ${accum8} + ${dataarray[8]} ))
   else
#
# If this line's date has changed, output to hospsum.csv, update prevdate,
# and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
# EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
      echo 
"${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}"
 >&3
      prevdate="${dataarray[0]}"
      accum2=${dataarray[2]}
      accum3=${dataarray[3]}
      accum4=${dataarray[4]}
      accum5=${dataarray[5]}
      accum6=${dataarray[6]}
      accum7=${dataarray[7]}
      accum8=${dataarray[8]}
   fi
done</dev/shm/temp0.txt
#
# Close file hospsum.csv
exec 3<&-
IFS="${oldifs}"


-- 
Roses are red
Roses are blue
Depending on their velocity
Relative to you

Reply via email to