Re: [darcs-users] darcs patch: Data format and converter for "square" r... (and 4 more)

Eric Kow Sat, 29 May 2010 08:56:26 -0700

On Fri, May 07, 2010 at 03:36:11 -0400, Max Battcher wrote:
> So this is a fun bundle of patches..??? After the "get it done" but "try
> not to touch/break anything important" first attempt at Graphing, this
> bundle of patches factors out the common parts of tabulateRepo/graphRepo.


Yay!

Apologies for the very late review.  Luckily there's no hold-up as
darcs-benchmark operates on a more liberal push first review when you
can basis. This looks like a nice refactor.  I didn't have anything to
say so I just rambled upon about stylistic niggles which may just be
personal taste anyway.  Sorry!

Data format and converter for "square" repo tables
--------------------------------------------------
> +data RepoTable = RepoTable { rtRepo :: String
> +                           , rtColumns :: [String]
> +                           , rtRows :: [String]
> +                           , rtTable :: [(TimeUnit, [Maybe MemTimeOutput])]
> +                           }

If I understand correctly, one of these guys corresponds to all the data
for a given repo (eg. GHC).  This is one step higher than having a Table
for each

rtColumns and rtRows refer to the row/column titles

rtTable is a list of rows (and each sublist is a cell?)
Each row uses the same units for comparability.

Once we have these RepoTables we should probably avoid trying to do
anything with them that's not straightforward rendering of some sort.
I'm really prone to mistakes like sorting row/column headers (say
according to fastest system or whatever) and then forgetting to actually
sort the data points themselves or vice-versa.  This is why I tend to
instead manipulate "fatty" cells, where each cell contains its own
row/column information.  Much more Eric-proof that way.  I do all my
sorting and grouping on the fat cells and only remove the headery bits
on the formatting end.

> +-- Reformat (test, output) array as a square table
> +repoTables :: [Benchmark ()] -> [(Test a, Maybe MemTimeOutput)] -> 
> [RepoTable]
> +repoTables benchmarks results = [RepoTable { rtRepo = reponame (head repo)
> +                                           , rtColumns = columns
> +                                           , rtRows = rows
> +                                           , rtTable = table repo
> +                                           } | repo <- reposResults]

Safe to use head here because repoResults is the result of groupBy.
This may be cleaner as

 [ ... | repo <- groupBy sameRepo results ]

which avoids keeps the groupBy-makes-non-empty-lists knowledge and the
consume-a-non-empty-list assumption near each other.  (On the other
hand, it may be harder to read, sigh tradeoffs).

> + where
> +  reposResults = groupBy sameRepo results -- ASSUME: ordered by repository
> +  sameRepo (Test _ tr1 _, _) (Test _ tr2 _, _) = trName tr1 == trName tr2
> +  reponame (Test _ tr _, _) = trName tr
> +  --
> +  rows = map description . filter hasBenchmark $ benchmarks
> +  hasBenchmark b = any (\ (Test tb _ _, _) -> description tb == description 
> b) results
> +  --
> +  columns = map mkColName columnInfos
> +  columnInfos = nub [ (b, trName tr) | (Test _ tr b, _) <- results ]
> +  mkColName (TestBinary b, tname) =
> +    let v = nameToVariant tname
> +        prefix = case vId v of
> +                   DefaultVariant -> ""
> +                   _-> vSuffix v ++ " "
> +    in prefix ++ cutdown b
> +  cutdown d | "darcs-" `isPrefixOf` d = cutdown (drop 6 d)
> +            | takeExtension d == ".exe" = dropExtension d
> +            | otherwise = d

By the way, check out
  http://wiki.darcs.net/Benchmarks/GhLinux
It could be worth doing some repo name clean up, maybe with takeFileName

It has "./darcs-2.3.1", which I think we could maybe prettify a bit.

> +  --
> +  table repo = [(tu (tableRow row), map justMemTimeOutput $ tableRow row)
> +               | row <- rows]

I'm not sure a list-comprehesion is the most readable route here.
How about a helper function like mkRow or rowInfo and a map?

> +    where
> +      tableRow row = [find (match row col) repo | col <- columnInfos]
> +      getTime (Just (_, Just mt)) = Just (mtTimeMean mt)
> +      getTime _ = Nothing
> +      tu row = case mapMaybe getTime row of
> +        [] -> Milliseconds
> +        xs -> appropriateUnit (minimum xs)
> +  match bench (binary, name) (Test bench' tr binary', _) =
> +      bench == description bench' && binary == binary' && trName tr == name
> +  justMemTimeOutput (Just (_, mt)) = mt
> +  justMemTimeOutput Nothing = Nothing

Move graphRepo to RepoTable
---------------------------
Seems better for me to just paste in the new code than work with the
diff

| graphRepoMemory :: RepoTable -> [String]
| graphRepoMemory repo = map graphUrl rows
|  where
|   rows = [(title rowname, rtColumns repo, graphdata rowdata)
|          | (rowname, (tu, rowdata)) <- zip (rtRows repo) (rtTable repo)]
|   title rowname = rowname ++ " (MiB)"
|   graphdata rowdata = map selectMemory rowdata
|   selectMemory (Just mt) = fromRational (mtMemMean mt) / (1024 * 1024)
|   selectMemory _ = 0.0

Each row (benchmark) corresponds to a single graph.  It's definitely a
lot clearer what's going on here now that you've broken it down.

I tend to use a lot more point-free style these days (eg graphdata = map
selectMemory), but maybe when I'm older and grumpier I'll swing back.

> +  t_graphs = map (repoTuple graphRepoTime) tables
> +  m_graphs = map (repoTuple graphRepoMemory) tables
> +  showG (r,gs) = intercalate "\n" $ [ r
> +                                    , replicate (length r) '-'
> +                                    , ""
> +                                    ] ++ (map imgDirective gs) ++ [""]
> +  imgDirective = (".. image:: " ++)

Heh, due to cherry picking this patch is incomplete (missing repoTuple).
But that's the way things go sometimes...

Refactor tabulateRepo to use RepoTable
--------------------------------------
Seems like a straightforward refactor.

> +  repoTuple tabulate repo = (rtRepo repo, tabulate repo)
> +  t_tables = map (repoTuple $ tabulateRepo formatTimeResult) tables
> +  m_tables = map (repoTuple $ tabulateRepo formatMemoryResult) tables

I actually would have probably found (repoTuple (tabulateRepo
formatTimeResult)) a bit easier to read (because the '$' primes me to
expect a non-function value at the other end).

Please don't take my stylistic comments too seriously!  There's a lot
of this kind of stuff that could just go either way and loop around
forever.


repoTables should use trCoreName; force sort
--------------------------------------------
>  It seems better to actually sort the input results than to assume they
>  are sorted.

Context: for variant repos, eg. op-darcs-2.3.1, the "core" name would be
that of the original unvarnished darcs (darcs-2.3.1).

> -  reposResults = groupBy sameRepo results -- ASSUME: ordered by repository
> -  sameRepo (Test _ tr1 _, _) (Test _ tr2 _, _) = trName tr1 == trName tr2
> -  reponame (Test _ tr _, _) = trName tr
> +  reposResults = groupBy sameRepo $ sortBy repoOrder results

> +  sameRepo (Test _ tr1 _, _) (Test _ tr2 _, _) = trCoreName tr1 == 
> trCoreName tr2
> +  reponame (Test _ tr _, _) = trCoreName tr

sameRepo could conceivably use reponame (on the other hand, maybe ((==) `on`
reponame) sort of the kind of just-because-you-can-doesn't-mean-you-should
code I might write)

reponame could conceivably be eliminated if we added general use field labels
(if nothing else, the Gentle Introduction to Haskell is useful for learning
what to call things.  I was just about to say "field accessor thingies"):

data Test a = Test { tBenchmark :: Benchmark a
                   ,  tRepo :: TestRepo
                   ...
                   }

> +  repoOrder a b = compare (reponame a) (reponame b)

This can be written as compare `on` reponame

  on :: (b -> b -> c) -> (a -> b) -> a -> a -> c  Source
  (*) `on` f = \x y -> f x * f y. 

(The kind of thing one picks up through cultural osmosis)

Refactor Run.hs to use updated tabulateRepo
-------------------------------------------
> -    case tabulate formatTimeResult res of
> -      []  -> return ()
> -      [(_,t)] -> echo_n $ TR.render id id id t
> -      _   -> error "Not expecting more than one table for a repo and its 
> variants"
> +    let tables = repoTables benchmarks res
> +    if length tables == 1 then
> +      echo_n $ TR.render id id id $ tabulateRepo formatTimeResult (head 
> tables)
> +      else error "Not expecting more than one table for a repo and its 
> variants"

Note that checking length and doing head is slightly less safe than just
relying on pattern-matching with case (albeit perhaps more readable).
It's unlikely that you'll break apart if length tables == 1 then foo
(head tables), which is why I say "slightly"

Also: do we never get the empty list?

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs patch: Data format and converter for "square" r... (and 4 more)

Reply via email to