Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org

You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  filesystem verification utility (Stephen Tetley)
   2. Re:  filesystem verification utility (Anand Mitra)
   3. Re:  filesystem verification utility (Stephen Tetley)


----------------------------------------------------------------------

Message: 1
Date: Mon, 10 Jan 2011 23:24:29 +0000
From: Stephen Tetley <stephen.tet...@gmail.com>
Subject: Re: [Haskell-beginners] filesystem verification utility
Cc: beginners@haskell.org
Message-ID:
        <aanlktimfox-gmhdyns-nh_lsqdagtxd3cuhawp3nz...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On 10 January 2011 22:26, Stephen Tetley <stephen.tet...@gmail.com> wrote:
> And append (++) inside the genPattern function...
>

Changing to use "functional strings" which have the ShowS type is the
obvious way to avoid the cost of (++) append. There is a library on
Hackage that provides many utility functions for a wrapped ShowS type
and an IsString instance so you can still use string literals.

http://hackage.haskell.org/package/dstring

However you might be better using one of the alternative String
packages such as Text or ByteString.



------------------------------

Message: 2
Date: Tue, 11 Jan 2011 09:49:47 +0530
From: Anand Mitra <mi...@kqinfotech.com>
Subject: Re: [Haskell-beginners] filesystem verification utility
To: beginners@haskell.org, hask...@haskell.org
Message-ID:
        <aanlktincnwiad8ofmb7mbhb8=0tsy5wzffdjjjglf...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Based  on feedback I inferred that the huge memory usage was mostly
because of the String handling in the patern generation. To make it
more efficient I have used Data.ByteString.Lazy.Char8

but now I get the following error when I execute

stress: tmp/asdf-1: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-3: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-5: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-2: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-7: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-10: hPutBuf: illegal operation (handle is closed)

I have explicitly remove all calls to close despite this I get this
error. I tried both version of seek the fdSeek as well as hSeek. The
documentation on hSeek had a confusing comment that "The offset i is
given in terms of 8-bit bytes"

At this point I am unable to understand how the handle is getting
explicitly closed.

{-# OPTIONS -fglasgow-exts #-}
import Data.Int
import qualified Data.ByteString.Lazy.Char8 as L
import System.Random
import Data.List
import Monad
import System.Posix.IO
import System.Posix.Types
import Data.Time.Clock
import System.Posix.Files
import Data.Maybe
import GHC.IO.Device (SeekMode(..))
import Control.Exception
import Data.Typeable
import System
import Control.Concurrent
import System.IO

--myrandomRlist :: (Num t, Random t) => t -> IO [t]
--myrandomRlist x = liftM (randomRs (0,x)) newStdGen

myrandomRlist :: (Num t, Random t) => t ->t -> StdGen  ->  [t]
myrandomRlist min max seed = randomRs (min,max)  seed

data IoLoc = IoLoc {offset ::FileOffset, size::Int, num::Int  }
             deriving (Show, Typeable)
instance Exception IoLoc

data Corrupt = Corrupt String IoLoc
             deriving (Show, Typeable)
instance Exception Corrupt

data FileHdr = FileHdr {fileName::FilePath, seed::StdGen,
                        minIoSize::Int, maxIoSize::Int, ioCount::Int}
               deriving (Show)

data FileIO = FileIO {fd::Handle, params::FileHdr, fileData::[IoLoc]}


check hdl = do
  closed <- hIsClosed hdl
  if closed
   then do
    putStrLn $ "file handle was closed" ++ (show hdl)
   else do
    return ()

genPattern :: FilePath -> IoLoc -> L.ByteString
genPattern f l =
     L.take (fromIntegral (size l)) (L.cycle  $ L.pack ("(" ++ f ++
")"  ++ "[" ++ (show (offset l)) ++ ":" ++ (show (size l)) ++ (show
(num l)) ++ "]"))

doHdrRead :: Handle -> IO [Char]
doHdrRead x = do
  fd <- handleToFd x
  (str, count) <- fdRead fd 100
  return (takeWhile (\x-> not (x == '\n')) str)

doGetHdr :: Handle -> IO FileHdr
doGetHdr fd = do
  file <- doHdrRead fd
  seed <- doHdrRead fd
  minIoSize <- doHdrRead fd
  maxIoSize <- doHdrRead fd
  ioCount <- doHdrRead fd
  return (FileHdr file (read seed) (read minIoSize) (read maxIoSize)
(read ioCount))

doHdrFromFile :: Handle -> IO FileIO
doHdrFromFile fd = do
--  fd <- openBinaryFile name  ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=False})
  hSeek fd AbsoluteSeek 0
  hdr <- doGetHdr fd
  return (FileIO fd hdr [])

doHdrWrite :: Handle -> [Char] -> IO ()
doHdrWrite fd str = do
  hPutStr fd (take 100 (str ++ "\n" ++ (cycle ['\0'])))

doWriteHdr :: Handle -> FileHdr -> IO ()
doWriteHdr fd hdr = do
  doHdrWrite fd (show (fileName hdr))
  doHdrWrite fd (show (seed hdr))
  doHdrWrite fd (show (minIoSize hdr))
  doHdrWrite fd (show (maxIoSize hdr))
  doHdrWrite fd (show (ioCount hdr))

doWriteFile :: FilePath -> IO FileIO
doWriteFile name = do
  fd <- openBinaryFile name  ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=True})
  seed <- newStdGen
  hdr <- return (FileHdr name seed 4096 (2*8096) 200)
  doWriteHdr fd hdr
  return (FileIO fd hdr [])

overLap (IoLoc off1' sz1' num1) (IoLoc off2' sz2' num2) =
    ((off1 > off2) && (off1 < off2 +sz2)) ||((off1+sz1 >off2)
&&(off1+sz1 < off2 +sz2))
    where
      off1 = read (show off1')
      sz1 = read (show sz1')
      off2 = read (show off2')
      sz2 = read (show sz2')


genIoList1 :: [IoLoc] -> [Int] -> [Int] -> Int -> Int -> [IoLoc]

genIoList1 list offset size _ 0 =
    list
genIoList1 list offset size 0 _ =
    list

genIoList1 list (offset:os) (size:ss) count bound =
    if isNothing $ find (overLap x) list
             then genIoList1 ([x] ++ list) os ss (count - 1) (bound -1)
             else genIoList1 list os ss count (bound -1)
    where
      x = IoLoc (read (show offset)) (read(show size)) count
--      offset = (read (show offset1))
--      size = (read(show size1))

genIoList :: FileHdr -> [IoLoc]
genIoList (FileHdr name seed min max count) =
    genIoList1 []  (myrandomRlist 4000 1099511627776 seed)
(myrandomRlist min max seed) count (count*2)

doActualIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doActualIo fd (IoLoc off sz num) str = do
--  hSeek fd AbsoluteSeek off
  rfd <- (handleToFd fd)
  fdSeek  rfd AbsoluteSeek off
  L.hPut fd str
  return ()

doVerifyIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doVerifyIo fd (IoLoc off sz num) str = do
--  hSeek fd AbsoluteSeek off
  rfd <- (handleToFd fd)
  fdSeek rfd AbsoluteSeek off
  filedata  <- L.hGet fd sz
  if str == filedata
     then return ()
     else throwIO (Corrupt ("Data corruption in #" ++  (take 200
(L.unpack str)) ++ "#"  ++ (take 200 (L.unpack filedata)))(IoLoc off
sz num))

mainWrite file  = do
  (FileIO hd _ _) <- doWriteFile file
  hdrIO <- doHdrFromFile hd
  check hd
  (FileIO _ hdr _) <- return hdrIO
  check hd
  iolst <- return $ genIoList hdr
  app <- return $ zip iolst (map (genPattern (fileName hdr)) iolst)
  putStrLn (fileName hdr)
  mapM (\(x,y)-> doActualIo (fd hdrIO)  x y) app
  mapM (\(x,y)-> doVerifyIo (fd hdrIO)  x y) app
--  hClose hd
  return hdr

mainVerify file = do
  hd <- openBinaryFile file  ReadWriteMode
  hdrIO <- doHdrFromFile hd
  hdr <- return (params hdrIO)
  iolst <- return $ genIoList hdr
  file <- return $ filter (\x->not ( x=='"')) (fileName hdr)
  app <- return $ zip iolst (map (genPattern file) iolst)
  putStrLn $ filter (\x->not ( x=='"')) (fileName hdr)
  mapM (\(x,y)-> doVerifyIo (fd hdrIO)  x y) app
--  hClose hd
  return hdr


main =  do
  x <- getArgs
  if (length x) == 3
    then do
      main1
    else do
      putStrLn "USAGE:\nfile-io write/verify <full-path file name>
<number of threads>  \
\\n\n\
\Simple IO load generator with write verification. This utility is designed\n\
\to generate multi-threaded IO load which will write a pattern to the file. \n\
\When this is invoked with the same parameters with the verify option \n\
\the data written will be verified.\n\
\ "

main1 = do
  [op, name, numProcs] <- getArgs
  m <- newEmptyMVar
  n <- return $read numProcs
  case op of
    "write" -> do mapM forkIO  [(fillfile (name ++ (show i)) ) m|i<-[1..n]]
    "verify" -> do mapM forkIO [(verifyfile (name ++ (show i)) ) m|i<-[1..n]]
  x <- mapM takeMVar $ take (read numProcs) $ repeat m
  putStrLn $ show $ and x

fillfile filename m = do
  mainWrite filename
  putMVar m True

verifyfile filename m = do
  mainVerify filename
  putMVar m True



On Tue, Jan 11, 2011 at 1:02 AM, Anand Mitra <mi...@kqinfotech.com> wrote:
> Hi,
>
> I had a requirement to generate various kinds of I/O patterns on a
> filesystem and subsequently verify this. The initial version of the
> program below implements a random I/O pattern with multiple threads.
> Even when the number of I/O is as small as 200 and 10 concurrent
> theads, the amount of memory used is huge. When I run the program it
> consumes close to 1to2GB memory. Moreover the rate at which it
> generates the I/O is very low which is not good for testing a
> filesystem. I have used System.POSIX.IO but I tried System.IO and did
> not see much difference either. I would appreciate help in identifying
> ways to improve this.
>
>



------------------------------

Message: 3
Date: Tue, 11 Jan 2011 10:08:07 +0000
From: Stephen Tetley <stephen.tet...@gmail.com>
Subject: Re: [Haskell-beginners] filesystem verification utility
Cc: beginners@haskell.org
Message-ID:
        <aanlkti=hz09xn0hopix9nw_pn57-gmtscfbkg=njt...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Anand

Firstly apologies - my advice from yesterday was trivial advice,
changing to a better representation of Strings and avoiding costly
operations (++) is valuable and should improve the performance of the
program, but it might be a small overall improvement and it doesn't
get to the heart of things.

Really you need to do two things - one is consider what you are doing
and evaluate whether it is appropriate for a performance sensitive
app, the other is to profile and find the bits that are too slow.

I rarely use Control.Concurrent so I can't offer any real experience
but I'm concerned that it is adding overhead for no benefit. Looking
at the code and what the comments say it does - I don't think your
situation benefits from concurrency. A thread in your program could do
all is work in one go, its not that you need to be servicing many
clients (cf. a web server that needs to service many clients without
individual long waits so it makes sense to schedule them) or that you
are waiting on other processes making resources available. So for your
program, any changes to execution caused by scheduling / de-scheduling
threads (probably) just add to the total time.

If you have a multi-core machine you could potentially benefit from
parallelism - splitting the work amongst available cores. But in GHC
forkIO spawns "green threads" which run in the same OS thread so you
won't be getting any automatic parallelism from the code even if you
have multi-core.

However don't take my word for this - I could easily be wrong. If you
want performance you really do need to see what the profiler tells
you.

Best wishes

Stephen



------------------------------

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 31, Issue 9
****************************************

Reply via email to