Hi -- I'm still working through the best and idiomatic way to use pipes to stream through large csv datasets (1M or more records), to accomplish the following:
1. Generate summary statistics for any one or series of fields moving down the data using fold, and grouped a given field in the data; output to a file 2. Generate separate set of summary statistics (e.g., a regression, min, max) combining fields within a single record (row); use this information to transform/grow the record with additional (new) derived fields; output to a file (separate from the one above) These are separate tasks. Because I don't need the results from (1) to do any of the transformations in (2), it would be nice to accomplish both simultaneously (a single continuous stream that produces two files) so I can make the transformations in (2) and report on the results of what was accomplished at a summary level (1). Here is my starting point. Thank you in advance for any guidance. {-# LANGUAGE OverloadedStrings #-} import Pipes.Csv (decodeByName) import Pipes.ByteString (ByteString, fromHandle) import Data.Csv ((.:), FromNamedRecord(..)) import Pipes import Control.Applicative import System.IO hiding (stdin) data File = File {name::String} deriving (Show) inFile :: File inFile = File "../data.csv" data Rec = Rec { uid :: String , val1 :: Int , val2 :: Int } deriving (Show) instance FromNamedRecord Person where parseNamedRecord r = Rec <$> r .: "uid" <*> r .: "val1" <*> r .: "val2" records :: Monad m => Producer ByteString m () -> Producer (Either String Person) m () records = decodeByName main :: IO () main = withFile (name inFile) ReadMode $ \hIn -> runEffect $ for (records (fromHandle hIn)) (lift . print) - E -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscr...@googlegroups.com. To post to this group, send email to haskell-pipes@googlegroups.com.