Re: [julia-users] Re: [ANN] Two packages: Lazy.jl & Mathematica.jl

Mike Innes Sun, 09 Mar 2014 03:49:25 -0700

I haven't done any real benchmarks but I imagine custom iterator types are
still much faster than generators, given that they're essentially zero
overhead.


Lazy sequences aren't exactly speed demons either - they're basically just
closures, which are a known sore spot for performance in Julia. This may
well improve, but you would generally want to use a more functional style
when you aren't as worried about performance anyway.

Stefan: "you want all the best and most powerful tools available" Yes, yes,
yes - this a thousand times.

The problem with demanding that "every problem has a single solution" is
that you end up with "every solution fits a single problem" as well. You
can't possibly foresee all of the solutions and determine which is best,
and you can't foresee all of the problems a tool can solve either. So
instead of trying to create a one-size-fits all solution to each class of
problem, let's make powerful tools that work seamlessly together and trust
our users to do what they do best - problem solving.


On 9 March 2014 00:03, andrew cooke <[email protected]> wrote:

>
> i don't think anyone was doubting that iterators are more efficient than
> tasks (me: "the only reason i can see for julia adding a separate mechanism
> for iterators separate from tasks is efficiency; mike: "[coroutines...]
> impossible to make iteration over custom data types fast or efficient.").
>
> what i was questioning more (although in a circumspect way, which i
> thought was being poite but may have only been annoying) was whether there
> is a need for (yet) another mechanism besides iterators and tasks.  lazy
> sequences (at least when i've used them) are equivalent to generators, so
> are less expressive than tasks.  and it seems the justification is one or
> more of: (1) some (perhaps many) people find them easier to understand; (2)
> they are more suited to some problems than others (and i was hoping to look
> at the challenge posed for tasks at some point); (3) they are more
> efficient than tasks (i don't think mike said this, but it may be true).
>
> andrew
>
>
> On Saturday, 8 March 2014 20:12:50 UTC-3, Stefan Karpinski wrote:
>
>> I'm going to try to answer the question by taking it to its logical
>> conclusion. Why does Julia have if statements, while loops and for loops
>> when all of these things can be accomplished with closures? In fact,
>> coroutines are even more powerful still, so why bother with anything
>> besides coroutines? Everything that can be accomplished with iteration can
>> also be done with recursion. Should we get rid of while loops and for
>> loops? Or maybe we should disallow recursion - you can always rewrite a
>> recursive algorithm using iteration and an explicit stack.
>>
>> Python's use of generators for custom iteration - not to mention raising
>> and catching an exception to terminate iteration - ensures that it is all
>> but impossible to make iteration over custom data types fast or efficient.
>> Python gets away with this because no one expects using data structures
>> defined in Python to be especially fast or efficient. The language's built
>> in types are special and iterating over them efficiently is baked into the
>> implementation. In Julia, ranges are just a user-defined 
>> type<https://github.com/JuliaLang/julia/blob/master/base/range.jl>that 
>> happens to be defined before your program begins - if iteration used
>> coroutines and exceptions in Julia, all for loops - even just iterating
>> from 1 to n - would be ridiculously slow and inefficient. We'd either be
>> forced to implement everything important in C, or we'd still be screwing
>> around with crazy optimization techniques to eliminate the coroutines and
>> exceptions. Instead, we used a design for iteration that's simple and easy
>> to make fast and efficient for simple things like iterating over a range or
>> an array, yet flexible enough for all kinds of user-defined types. There
>> are also coroutines and exceptions if you need them, but they have some
>> overhead, so they should be used as appropriate.
>>
>> So you are left with a choice between using a task that produces values -
>> which is easy but not very efficient - or implementing an iteration type -
>> which is more work, but also much more efficient. But this kind of choice
>> isn't unusual in programming - writing code is full of choices between
>> different ways to accomplish the same thing with different tradeoffs. The
>> Python mantra that "there should be one - and preferably only one - obvious
>> way to do it" has always stuck me as hopelessly naïve - or maybe it's just
>> wishful thinking. Sure, it's nice if there is an obvious way to do
>> something, and even better if there's one clearly right way. But that's a
>> lucky and rare situation. Most problems don't have exactly one obvious
>> solution. Easy problems have many obvious solutions. Hard problems have no
>> obvious solutions (by definition). Trying to design a language so that
>> there's only one obvious solution to most problems strikes me as a recipe
>> for having a language that's not powerful enough to solve really hard
>> problems well. When you're trying to solve a truly difficult problems,
>> don't you want all the best and most powerful tools available - even if
>> that means that there are lots of ways to solve easier problems?
>>
>>
>>
>> On Sat, Mar 8, 2014 at 8:36 AM, Mike Innes <[email protected]> wrote:
>>
>>> Ok, fair enough - I think the confusion for me lies in the fact that I
>>> wouldn't have said that Julia has "lazy lists, tasks and iterators", in the
>>> same way that I wouldn't say it has "floats, integers and numbers", because
>>> the former two are just types of the latter. But now I think I understand
>>> that by "iterator" you mean "iterator implementation via a custom type" -
>>> like the Take and Repeat types that Iterators.jl uses. Right? Also, I want
>>> to separate the idea of "tasks" and "generators", because "tasks" are just
>>> coroutines - they can be used to make generators, as you have, but it's not
>>> their only purpose.
>>>
>>> I think I'm in agreement with you that iterators, in that sense, are
>>> best reserved for when they have a specific purpose (like Ranges, for
>>> example). I'm not convinced that the Iterators.jl style is the best idea
>>> myself, so lets leave that alone for now. Then it comes down to generators
>>> and lazy sequences, which as you've pointed out are two different ways to
>>> solve the same problem.
>>>
>>> As I've mentioned, these are both reflections of two very different
>>> styles of programming, procedural vs. functional. In my view, the fact that
>>> different people have different tastes is *exactly *the reason to
>>> support both paradigms, as opposed to deciding on "one true way" for
>>> everyone. That article, while it doesn't apply 1:1 to our discussion, also
>>> looks at the idea that in many cases one style is objectively preferable to
>>> another - in which case, it only make sense for Julia to support both.
>>>
>>> I'd be interested to see the tree-walking iterator mentioned in the
>>> article implemented via a task. I could be wrong, but I imagine it would be
>>> reasonably difficult compared to the lazy sequence version. Equally, I
>>> don't know of anything that's harder with sequences than with generators,
>>> so if you can think of anything I'd be interested in having a go at it.
>>>
>>>
>>> On 8 March 2014 11:44, andrew cooke <[email protected]> wrote:
>>>
>>>>
>>>> i realise that in julia iterators are a protocol (that they rely on
>>>> start, done and next, and that the underlying type used to "do" the
>>>> iteration depends on what is being iterated over).  but that's not true in
>>>> python, for example, where all iterators are implemented as coroutines.
>>>> the only reason i can see for julia adding a separate mechanism for
>>>> iterators separate from tasks is efficiency - it's less work to use the
>>>> iterator protocol to effectively manage an integer than to have a task.  or
>>>> maybe it's that consume is explicit in julia while it's not in python, so
>>>> tasks look uglier in julia?
>>>>
>>>> to me this seems confusing.  for example, it would be nice to have
>>>> something that takes a task and generates a new task than is the contents
>>>> of the old task repeated?  but the repeat() function in Iterators.jl
>>>> doesn't do that.  instead it gives you an iterator.  i don't know if this
>>>> matters in practice - i haven't use tasks and iterators enough - but it
>>>> seems like a mess.  why two different things?
>>>>
>>>> similarly, i understand, i think, that both lazy streams and tasks are
>>>> implemented differently.  but a task that produces tasks doesn't give me a
>>>> headache any more than lazy streams of lazy streams.  in fact tasks
>>>> generally seem simpler (to me) because you don't have to worry about making
>>>> the flow work nicely - you can just bail out with a produce.  but maybe
>>>> it's just that i am more used to python than to scheme.  again, why two
>>>> different things?  just because you are used to programming in scheme and i
>>>> am used to python?  that's not a great answer in my book.
>>>>
>>>> (and the task version of take doesn't require 20 lines, for example -
>>>> https://github.com/andrewcooke/BlockCipherSelfStudy.jl/blob/
>>>> master/src/Tasks.jl#L5 )
>>>>
>>>> someone else has pointed me to http://journal.stuffwithstuff.
>>>> com/2013/01/13/iteration-inside-and-out/ which i haven't read yet,
>>>> saying that it explains the difference between iterators and tasks.  maybe
>>>> that will help me.
>>>>
>>>> thinking more about this last night i did realise that my instinctive
>>>> aversion to having lots of ways to do the same thing isn't necessarily
>>>> reasonable in julia.  in a sense, what does it matter if julia has lazy
>>>> streams, tasks and iterators, if they all use the same names for
>>>> functions?  because then you can swap types out and code will still work.
>>>> so i guess the cost to have take defined for iterators, and for tasks and
>>>> for lazy streams is less than i imagined.
>>>>
>>>> andrew
>>>>
>>>>
>>>> On Saturday, 8 March 2014 08:06:33 UTC-3, Mike Innes wrote:
>>>>>
>>>>> So, to clarify, Iterators aren't a thing in themselves. Iteration is
>>>>> an interface, and to call something an iterator just means that you can 
>>>>> put
>>>>> it in a for loop. Tasks and Lazy Lists are both iterators; so are arrays,
>>>>> sets, dictionaries, and a whole bunch of other things. But although you 
>>>>> can
>>>>> use them in a similar way if you want to, they are all designed to solve
>>>>> very different problems.
>>>>>
>>>>> Now, Tasks and Lazy Lists do look similar in that you can produce and
>>>>> consume a stream of values with both, but conceptually they are quite
>>>>> different - Tasks are a mechanism for control flow, whereas Lazy Lists are
>>>>> a data structure. Perhaps you could call them the procedural and 
>>>>> functional
>>>>> analogies of each other. I can't tell you what's best for you, but if
>>>>> you're thinking of Tasks as representing a sequence of data, then there's 
>>>>> a
>>>>> good chance you'll find Lazy Lists easier to reason about.
>>>>>
>>>>> For example, consider the partition() function. In Lazy.jl terms this
>>>>> splits a single list into a list of lists - it's fairly easy to visualise
>>>>> this:
>>>>>
>>>>> > partition(3, seq(1:9))
>>>>> List:
>>>>>   (1 2 3)
>>>>>   (4 5 6)
>>>>>   (7 8 9)
>>>>>
>>>>> If you wanted to write partition() for Tasks, you'd end up with tasks
>>>>> that produce tasks. I don't know about you, but that gives me a headache.
>>>>>
>>>>> You'll also notice that working with general iterators takes a lot of
>>>>> work; consider the Iterators.jl version of take(), which takes about 
>>>>> twenty
>>>>> lines, versus the two-liner in Lazy.jl. Some things are simply impossible
>>>>> to do generically, like flatten().
>>>>>
>>>>> That's not to say that Tasks aren't useful - they're better if you
>>>>> want to do more things in terms of control flow and less in terms of
>>>>> manipulating the data itself, for example. Both Tasks and Lazy Lists are
>>>>> extremely powerful, but each within their own scope - hence it's useful to
>>>>> have both.
>>>>>
>>>>> Is this roughly what you were looking for? Let me know if I've missed
>>>>> anything.
>>>>>
>>>>
>>>
>>

Re: [julia-users] Re: [ANN] Two packages: Lazy.jl & Mathematica.jl

Reply via email to