Thank you Gabriel.  A rich, helpful response.  You captured the summaries 
of the field values as they stream in.  I'm not sure I see where I consider 
folding over, for instance, a time series of data captured in each row? 
 e.g., if this is my data:

record id vat_t1, val_t2, val_t3, val_t4
34598     45.09,  42.73,  40.82,  35.92
45523     25.56,  56.23,  67.32,  14.83

How should I consider computing the derived field: agv_t1_t4 where the 
average is of val_1 through val_4.  What is produced will get consumed 
writing to a new csv file such as:

record id vat_t1, val_t2, val_t3, val_t4, *agv_t1_t4*
34598     45.09,  42.73,  40.82,  35.92,  *42.90*
45523     25.56,  56.23,  67.32,  14.83,  *47.20*

- E

On Monday, February 1, 2016 at 1:38:22 PM UTC-5, Edmund Cape wrote:
>
> Hi -- I'm still working through the best and idiomatic way to use pipes to 
> stream through large csv datasets (1M or more records), to accomplish the 
> following:
>
>
>    1. Generate summary statistics for any one or series of fields moving 
>    down the data using fold, and grouped a given field in the data; output to 
>    a file
>    
>    2. Generate separate set of summary statistics (e.g., a regression, 
>    min, max) combining fields within a single record (row); use this 
>    information to transform/grow the record with additional (new) derived 
>    fields; output to a file (separate from the one above)
>
> These are separate tasks.  Because I don't need the results from (1) to do 
> any of the transformations in (2), it would be nice to accomplish both 
> simultaneously (a single continuous stream that produces two files) so I 
> can make the transformations in (2) and report on the results of what was 
> accomplished at a summary level (1).
>
> Here is my starting point.  Thank you in advance for any guidance.
>
> {-# LANGUAGE OverloadedStrings #-}
>
> import Pipes.Csv (decodeByName)
> import Pipes.ByteString (ByteString, fromHandle)
> import Data.Csv ((.:), FromNamedRecord(..))
> import Pipes
> import Control.Applicative
> import System.IO hiding (stdin)
>
> data File = File {name::String} deriving (Show)
> inFile :: File
> inFile  = File "../data.csv"
>
> data Rec = Rec { uid  :: String
>                , val1 :: Int 
>                , val2 :: Int } deriving (Show)
>
> instance FromNamedRecord Person where
>   parseNamedRecord r =
>     Rec <$> r .: "uid"
>         <*> r .: "val1"
>         <*> r .: "val2"
>
> records :: Monad m
>         => Producer ByteString m ()
>         -> Producer (Either String Person) m ()
> records = decodeByName
>
> main :: IO ()
> main =
>   withFile (name inFile) ReadMode  $ \hIn  ->
>   runEffect $ for (records (fromHandle hIn)) (lift . print)
>
>
> - E
>

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Reply via email to