> > I'll address one specific point quickly, as I want to take time to reply > to the rest properly. > > On Wed, 4 Aug 2010, Jason Dagit wrote: > > > So, if I implement the above space leak fix, do you plan to accept the > > patches? I ask because if you are not planning to accept them I won't > > bother with the fix :) > > I think there are two issues to consider here: whether your use of try is > ok, and whether it's a good idea to expose try in our parser code at all, > for fear that someone will come along later and make seemingly innocuous > use of it causing a serious problem. > > I've actually realized now that my definition of mplus has a 'try' baked right in. That is, it's like a two parameter version of try. Maybe that's what you were getting at when you said, "try foo <|> return Nothing = foo". So while that statement isn't completely accurate as I explained previously, it does capture how mplus and try are similar. The parser ended up this way because our parsers are of the form ByteString -> Maybe (a, ByteString). In the failure case, you can't know how much string was consumed before the failure. I think it would have to be ByteString -> (Maybe a, ByteString) for try and mplus to be significantly different. The immediate result, is that I can stop exposing try. I don't know if that version of my parser will be compatible with other parser libraries (which was one of the initial motivating points of this refactor). I assume that in parsec/attoparsec that try and mplus are significantly different. I would need a chance to read the source code to determine if this is true.
My benchmarks suggest that mplus is not introducing a space leak that is detectable in normal usage patterns, eg., by darcs-benchmark: http://wiki.darcs.net/Benchmarks/ParserRefactor I deleted the old suspect benchmarks and all that remains is my benchmark on a ramdisk. As I have time I'll do a benchmark with trys in there. > > IMO the latter point is one that should be discussed a bit more widely. My > general impression - and I'm by no means an expert - about the existence > of try in parser libraries is that it can be quite problematic. Here it > would be changing the interface of the parsing library away from one which > pretty much enforces a single pass scan. On the other hand, the benefits > in compositionality are nice, as we can see from your patches. So I'm > torn, but I think other people should get involved in the decision, and > that we should perhaps investigate what parser libraries out there offer > it. > > While peekInput allows us to write code that does not leak, it also makes it really easy to write leaky code. All you have to do is peek and then look at the result later after applying some other parsers and you have a leak. Your objection to try seems to be that try is doing this for you in a controlled way (really I mean encapsulated way). So, try automatically creates a leak, but the scope and size of the leak is also more predictable because it's only the scope of the try. The leak also goes away once the parser finishes. On the other hand, with try the leak is inescapable. I would consider alterInput to be unsafe because it allows arbitrary updates to the parser's state. I would call that a leaky abstraction or abstraction violation. My software engineering sensibilities tell me it's evil. In summary: * the leaks created by try/mplus seem to be unavoidable as long as we expose them * the leaks created by mplus do not seem to be noticeable, probably because they are short lived in our parser * alterInput is evil * peekInput allows code that is even leakier than try I would say that, as long as darcs isn't using more memory or being slower that the new parser API is an improvement. As for other libraries, these are the ones I know of: * parsec2 and parsec3 * parsimony * attoparsec * uu parser lib (I don't think we can use this based on a comment by the author on Haskell-Cafe) Happy and alex could potentially apply, but it seems that our consumption of whitespace and our token sets seem to vary depending on where we are in the input stream. We may also lose control over knowing if we have back tracking in our parser. One of the things I wanted to accomplish by refactoring the parser is the ability when parsing from files to track how much of the input has been consumed and be able to later re-open the file and seek back to that point in the stream and start reading again from there. Granted, I don't know which of the above parser libraries actually support this. I know attoparsec does not support it. I may need to implement that in the parser we use instead of using an existing library. Implementing it is easier if alterInput doesn't exist. I should say, with uncontrolled use of alterInput I don't know how I would implement it. I suspect it wouldn't be possible. Jason
_______________________________________________ darcs-users mailing list darcs-users@darcs.net http://lists.osuosl.org/mailman/listinfo/darcs-users