Thank you Gabriel. A rich, helpful response. You captured the summaries of the field values as they stream in. I'm not sure I see where I consider folding over, for instance, a time series of data captured in each row? e.g., if this is my data:
record id vat_t1, val_t2, val_t3, val_t4 34598 45.09, 42.73, 40.82, 35.92 45523 25.56, 56.23, 67.32, 14.83 How should I consider computing the derived field: agv_t1_t4 where the average is of val_1 through val_4. What is produced will get consumed writing to a new csv file such as: record id vat_t1, val_t2, val_t3, val_t4, *agv_t1_t4* 34598 45.09, 42.73, 40.82, 35.92, *42.90* 45523 25.56, 56.23, 67.32, 14.83, *47.20* - E On Monday, February 1, 2016 at 1:38:22 PM UTC-5, Edmund Cape wrote: > > Hi -- I'm still working through the best and idiomatic way to use pipes to > stream through large csv datasets (1M or more records), to accomplish the > following: > > > 1. Generate summary statistics for any one or series of fields moving > down the data using fold, and grouped a given field in the data; output to > a file > > 2. Generate separate set of summary statistics (e.g., a regression, > min, max) combining fields within a single record (row); use this > information to transform/grow the record with additional (new) derived > fields; output to a file (separate from the one above) > > These are separate tasks. Because I don't need the results from (1) to do > any of the transformations in (2), it would be nice to accomplish both > simultaneously (a single continuous stream that produces two files) so I > can make the transformations in (2) and report on the results of what was > accomplished at a summary level (1). > > Here is my starting point. Thank you in advance for any guidance. > > {-# LANGUAGE OverloadedStrings #-} > > import Pipes.Csv (decodeByName) > import Pipes.ByteString (ByteString, fromHandle) > import Data.Csv ((.:), FromNamedRecord(..)) > import Pipes > import Control.Applicative > import System.IO hiding (stdin) > > data File = File {name::String} deriving (Show) > inFile :: File > inFile = File "../data.csv" > > data Rec = Rec { uid :: String > , val1 :: Int > , val2 :: Int } deriving (Show) > > instance FromNamedRecord Person where > parseNamedRecord r = > Rec <$> r .: "uid" > <*> r .: "val1" > <*> r .: "val2" > > records :: Monad m > => Producer ByteString m () > -> Producer (Either String Person) m () > records = decodeByName > > main :: IO () > main = > withFile (name inFile) ReadMode $ \hIn -> > runEffect $ for (records (fromHandle hIn)) (lift . print) > > > - E > -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscr...@googlegroups.com. To post to this group, send email to haskell-pipes@googlegroups.com.