Re: [elixir-core:8970] Re: [Proposal] Document the Big-O complexity (time and space) of every collection and their functions

Mário Guimarães Fri, 19 Jul 2019 09:33:47 -0700

Chris

this proposal is originally to "Document the Big-O complexity (time and
space) of every collection and their functions".
This does not require any `@doc`, it suffices to have a small section on
the top of a collection's module documentation, where such information can
be accessed easy and clearly.


Mário

Chris Keathley <c...@keathley.io> escreveu no dia sexta, 19/07/2019 à(s) 17:08:

> > Chris, to clarify, no addition to the language is necessary to support
> doc metadata
>
> Sorry I misspoke. I still contend that the core problem can be solved with
> more information and more clarity using all of our existing tools.
>
> > Oh, that was already explained:
>
> I understand what you're proposing. But showing a table isn't a *problem*.
> It's one solution to a problem. I'm not convinced this proposal adequately
> solves the core problem.
>
> On Fri, Jul 19, 2019 at 11:35 AM Mário Guimarães <
> mario.luis.guimar...@gmail.com> wrote:
>
>> Oh, that was already explained: to have the O(...) annotation at a well
>> structured place that can be used for post-processing; for example, to
>> generate a table collecting those O(...) annotations, to be placed at a
>> well known point, like the top of the collection's module.
>>
>> This is not an addition to the language per se, because @doc can be used
>> already in the form of @doc x: term
>>
>> This is an addition to the documentation of the collections that come
>> with Elixir, and not only on the functions of those collections, because
>> for example "sort" comes from "Enum" only and deserves to have its
>> complexity documented for the different collections; in this case,
>> documenting sort in a complexity section on the top of the collection
>> document is perhaps the best place.
>>
>>
>> Chris Keathley <c...@keathley.io> escreveu no dia sexta, 19/07/2019 à(s)
>> 16:19:
>>
>>> > What's the difficulty of annotation like this, which can be
>>> complemented if necessary with an additional explanation in the module or
>>> function documentation?
>>>
>>> What's the difficulty in just using module docs and function docs? If
>>> additional information would go in the module and function documentation
>>> why not put everything there? I'm not saying that we shouldn't communicate
>>> this information when its useful. But it hasn't been proven to me that we
>>> need an addition to the language to accomplish that goal.
>>>
>>> On Fri, Jul 19, 2019 at 11:00 AM Mário Guimarães <
>>> mario.luis.guimar...@gmail.com> wrote:
>>>
>>>> This proposal is to document the complexity for the collections that
>>>> come with Elixir and the functions that manipulate them. There aren't many
>>>> of these collections, they are the simplest collections that can exist from
>>>> the user perspective, they are very distinct, and so, I think that the most
>>>> simplified "O(...)" is enough to compare across them. For anyone
>>>> creating new collections outside the standard library, they choose to
>>>> document their collections as they wish.
>>>>
>>>> The case of average running times I believe will depend on the
>>>> application domain, so it is better dealt by benchmarking the application
>>>> for the typical inputs in its domain.
>>>>
>>>> The case  of "O(min(n, W))"  can be well complemented by explaining
>>>> what "W" is in the function's documentation, in case such uncommon
>>>> variables stay after simplification of the "O(...)". Again, I do not think
>>>> that Elixir collections will need these.
>>>>
>>>> What's the difficulty of annotation like this, which can be
>>>> complemented if necessary with an additional explanation in the module or
>>>> function documentation?
>>>>
>>>> @doc time_complexity: "O(n) if n < 32, otherwise O(log32(n))"
>>>>
>>>> Mário
>>>>
>>>> sexta-feira, 19 de Julho de 2019 às 15:23:13 UTC+1, Chris Keathley
>>>> escreveu:
>>>>>
>>>>> I agree with all of the points made by Michał and Paul. Additionally,
>>>>> most functions will never need to specify this information. In the few
>>>>> cases where you might want to specify this information it seems like it
>>>>> would be very reasonable to use module docs and function docs. This is
>>>>> especially true because asymptotic time _isn't_ the most useful way to
>>>>> measure immutable data structures. Often you'll want to look at both
>>>>> asymptotic and amortized time complexities. But even if we set amortized
>>>>> time to the side for a second, how would you document a data structure 
>>>>> that
>>>>> internally uses a list until the list size reaches a certain amount and
>>>>> then switches to a HAMT with a branching factor of 32? Do you say that
>>>>> searching this data structure is O(n) when n < 32 and O(log32(n)) when n 
>>>>> >=
>>>>> 32? In my mind putting this information in the module doc is clearer and
>>>>> allows me to write a full explanation.
>>>>>
>>>>> It seems to me that the core problem is communicating relative
>>>>> performance of different data structures and algorithms. I don't think 
>>>>> this
>>>>> proposal is needed to solve that problem and it creates more complexities
>>>>> than benefits.
>>>>>
>>>>> On Fri, Jul 19, 2019 at 7:26 AM Mário Guimarães <
>>>>> mario.lui...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> The idea of this proposal is to document time and space complexity of
>>>>>> collections in a more structured / organized way instead of being missing
>>>>>> or spread opportunistically here and there throughout the documentation.
>>>>>>
>>>>>> I also think that such information could be presented in a proper
>>>>>> section at the top part of a collection's module documentation before the
>>>>>> function list.
>>>>>>
>>>>>> Using `@doc` annotations on functions are not strictly necessary, but
>>>>>> provide a way to capture that information that is ameanable for
>>>>>> post-processing, for example, to generate a table at the top of the
>>>>>> documentation page. Moreover, they do not need to be presented next to 
>>>>>> the
>>>>>> documentation of the functions they annotate.
>>>>>>
>>>>>>
>>>>>> The Big-O annotation practice is not to write "O(2n+3)" but to reduce
>>>>>> this to its pattern of asymptotic growth, which in this case is simply
>>>>>> "O(n)".
>>>>>>
>>>>>> This means that I totally agree that complex mathematical formulas do
>>>>>> not belong to this documentation.
>>>>>>
>>>>>> Among the several existing predictors, Big-O is the one generally
>>>>>> used and that people are most aware of. It is also the one that predicts
>>>>>> the worst-case asymptotic performance an algorithm can get when systems 
>>>>>> are
>>>>>> put under the stress of large inputs. Adding other indicators introduces
>>>>>> unnecessary complexity and confusion, which obviously is not what we want
>>>>>> in our documentation.
>>>>>>
>>>>>> In general, the "n" relates to the number of elements in the
>>>>>> collection, and I suppose that in cases of functions involving two
>>>>>> collections, they can be annotated with something like "O(n+m)", where 
>>>>>> "n"
>>>>>> and "m" are the number of elements in each collection.
>>>>>>
>>>>>> These are very simple expressions every software developer should
>>>>>> know or be interested to learn for the sake of understanding algorithms
>>>>>> better and the quality of the software systems they write.
>>>>>>
>>>>>>
>>>>>>
>>>>>> sexta-feira, 19 de Julho de 2019 às 11:32:22 UTC+1, Fernando Tapia
>>>>>> Rico escreveu:
>>>>>>>
>>>>>>> To give a bit more context: the top level documentation for Enum,
>>>>>>> Keyword, List, Map, MapSet, and Stream already contain some "
>>>>>>> *hints"* about time complexity :)
>>>>>>>
>>>>>>> On Friday, July 19, 2019 at 12:28:56 PM UTC+2, Wiebe-Marten Wijnja
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Having read all of the replies, I like the suggestion to add simple
>>>>>>>> readable documentation to the data structures' module docs the best.
>>>>>>>>
>>>>>>>> Adding new documentation metadata seems to me not worth it. Besides
>>>>>>>> the already mentioned problems, I would like to mention that
>>>>>>>> complexity-documentation is not *that* structured:
>>>>>>>>
>>>>>>>> - Besides Big-O, there is Big-Omega and Little-o, which are also
>>>>>>>> very useful and relatively common in literature. Do we also need to 
>>>>>>>> add new
>>>>>>>> metadata names for those?
>>>>>>>> - Time-complexity and Space-complexity are by no means the only
>>>>>>>> complexities that are being measured for data structures. Do we also 
>>>>>>>> need
>>>>>>>> to add new metadata names for e.g. communication complexity,
>>>>>>>> - What about amortized complexities?
>>>>>>>> - In regards to *what* is the complexity measured? (i.e. "what is
>>>>>>>> `n` in O(2n+3)"?)
>>>>>>>>
>>>>>>>> And that, combined with the fact that I think it is important to
>>>>>>>> also accommodate people who are new to this mathematical way of writing
>>>>>>>> algorithmic complexities, makes me conclude that it is a bad idea to 
>>>>>>>> create
>>>>>>>> a rigid structured way for it, and that adding some humanly-readable
>>>>>>>> unstructured documentation for common operations on commonly-used
>>>>>>>> datastructures would be the way to go. These unstructured documentation
>>>>>>>> snippets might be accompanied by their formal complexity notations, but
>>>>>>>> should not only consist of these.
>>>>>>>>
>>>>>>>> ~Qqwy/Marten
>>>>>>>>
>>>>>>>> On Thursday, July 18, 2019 at 6:36:04 PM UTC+2, Mário Guimarães
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I find this can be useful in selecting the collections and their
>>>>>>>>> manipulating functions to be used in our code.
>>>>>>>>>
>>>>>>>>> The case is that in an immutable language it may not be evident
>>>>>>>>> what is the O(...) of the insert, read, update, delete and sort, in 
>>>>>>>>> our
>>>>>>>>> collections.
>>>>>>>>>
>>>>>>>>> For example, what is the complexity of sorting an immutable
>>>>>>>>> enumerable via `Enum.sort`? I really do not know.
>>>>>>>>>
>>>>>>>>> This could be documented with a table at the start of the
>>>>>>>>> documentation for the collection.
>>>>>>>>>
>>>>>>>>> This effort could be decomposed in as many issues as the number of
>>>>>>>>> collections to document.
>>>>>>>>>
>>>>>>>>> Experienced people may know many of these Big-O details, but for
>>>>>>>>> newcommers to Elixir (and Erlang) it could be very helpful.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Mário
>>>>>>>>>
>>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elixir-lang-core" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elixir-l...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/d8bd66d7-4328-4c2d-8d9b-19002a8d6bc9%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/d8bd66d7-4328-4c2d-8d9b-19002a8d6bc9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Chris
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elixir-lang-core+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elixir-lang-core/e6919fe4-9bb6-4894-93ce-8ac4f4255869%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/e6919fe4-9bb6-4894-93ce-8ac4f4255869%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Chris
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elixir-lang-core" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elixir-lang-core+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGjNwxqnSm4sya_zwUWzxDTmF3HxgcAK%3D3abGsxFqvTE4NU%3Dpg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGjNwxqnSm4sya_zwUWzxDTmF3HxgcAK%3D3abGsxFqvTE4NU%3Dpg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elixir-lang-core+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elixir-lang-core/CAF7CYk66Y9JN-rxr7mznstyz%3Db%2B_aSwW77pO1wROw8EGZ5Ypzw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elixir-lang-core/CAF7CYk66Y9JN-rxr7mznstyz%3Db%2B_aSwW77pO1wROw8EGZ5Ypzw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Chris
>
> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elixir-lang-core+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/CAGjNwxrOhv0u%3Doy_iAqPmOsRObJ00gnyH2jXE-B%3DGq22ZaDKvQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGjNwxrOhv0u%3Doy_iAqPmOsRObJ00gnyH2jXE-B%3DGq22ZaDKvQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAF7CYk4hFYFsUaNg-Nh8zSSCJHCXk%3D1EJqEQp222U2oQfUgoSA%40mail.gmail.com.

Re: [elixir-core:8970] Re: [Proposal] Document the Big-O complexity (time and space) of every collection and their functions

Reply via email to