[haskell-pipes] Generating quantitative summaries down, and transformations across csv data using pipes

Edmund Cape Mon, 01 Feb 2016 10:38:58 -0800

Hi -- I'm still working through the best and idiomatic way to use pipes to 
stream through large csv datasets (1M or more records), to accomplish the 
following:



   1. Generate summary statistics for any one or series of fields moving 
   down the data using fold, and grouped a given field in the data; output to 
   a file
   
   2. Generate separate set of summary statistics (e.g., a regression, min, 
   max) combining fields within a single record (row); use this information to 
   transform/grow the record with additional (new) derived fields; output to a 
   file (separate from the one above)

These are separate tasks.  Because I don't need the results from (1) to do 
any of the transformations in (2), it would be nice to accomplish both 
simultaneously (a single continuous stream that produces two files) so I 
can make the transformations in (2) and report on the results of what was 
accomplished at a summary level (1).

Here is my starting point.  Thank you in advance for any guidance.

{-# LANGUAGE OverloadedStrings #-}

import Pipes.Csv (decodeByName)
import Pipes.ByteString (ByteString, fromHandle)
import Data.Csv ((.:), FromNamedRecord(..))
import Pipes
import Control.Applicative
import System.IO hiding (stdin)

data File = File {name::String} deriving (Show)
inFile :: File
inFile  = File "../data.csv"

data Rec = Rec { uid  :: String
               , val1 :: Int 
               , val2 :: Int } deriving (Show)

instance FromNamedRecord Person where
  parseNamedRecord r =
    Rec <$> r .: "uid"
        <*> r .: "val1"
        <*> r .: "val2"

records :: Monad m
        => Producer ByteString m ()
        -> Producer (Either String Person) m ()
records = decodeByName

main :: IO ()
main =
  withFile (name inFile) ReadMode  $ \hIn  ->
  runEffect $ for (records (fromHandle hIn)) (lift . print)


- E

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

[haskell-pipes] Generating quantitative summaries down, and transformations across csv data using pipes

Reply via email to