Re: Lambda Expressions - filter list without <#list> directive

Daniel Dekany Sun, 23 Jun 2019 17:00:22 -0700

Well, I'm not exactly fast nowadays either... Anyway, I have pushed
and deployed to the snapshot repo the changes I was talking about
recently. That is, ?map or ?filter won't make a sequence out of an
enumerable non-sequence (typically an Iterator) anymore. Because, it
was the concern that if hugeResultSet is an Iterator because it's
huge, then someone might writes:

  <#assign transformed = hugeResultSet?map(it -> something(it))>
  <#list transformed as it>

instead of just

  <#list hugeResultSet?map(it -> something(it)) as it>

and thus consuming a lot of memory without realizing it. So now if
hugeResultSet wasn't already a sequence (List-like), the assignment
will be an error, since we can't safely store a lazily transformed
collection (lambdas will break), and we can't condense it down to a
sequence (List-like thing) automatically either, as that might
consumes too much memory. If hugeResultSet was a sequence, then it's
not an error, as we assume that keeping all of it in memory is fine,
as the original was stored there as well (in practice, most of the
times... in principle we can't know).

Now if the user feels confident about it, they can still write:

  <#assign transformed = hugeResultSet?map(it -> something(it))?sequence>

Similarly, hugeResultSet?map(it -> something(it))[index] will be an
error, as [index] is for sequences only, and ?map will not change a
non-sequence to a sequence anymore. Similarly, if the user feels
confident about it, they can write hugeResultSet?map(it ->
something(it))?sequence[index].

An interesting consequence of these is that ?sequence is now a bit
smarter than before. Like if you write myIterator?sequnce[n], it will
not fetch the elements into an in-memory sequence, it just skips n
elements from myIterators, and returns the nth one. Similarly,
myIterator?sequence?size won't store the elements in memory, it just
counts them.

As an interesting note, these two are also identically efficient:

  <#assign seq = hugeResultSet?filter(it -> something(it))?sequence>
  <#assign seq = hugeResultSet?sequence?filter(it -> something(it))>

In both cases the actual conversion to a sequence (in-memory list)
happens only just before assigning the value to seq. Once again,
?sequence now just means "it's OK to treat this as a sequence, however
inefficient it is", and not "convert it to sequence right now".

Friday, June 7, 2019, 10:38:50 AM, Christoph Rüger wrote:

> These optimisations sound great. I will try to run some tests within the
> next weeks. A bit busy lately.
> Thanks
> Christoph
>
> Am Mi., 29. Mai 2019 um 23:55 Uhr schrieb Daniel Dekany <[email protected]
>>:
>
>> Tuesday, April 2, 2019, 12:10:16 PM, Christoph Rüger wrote:
>>
>> [snip]
>> >> Well, if you fear users jumping on ?filter/?map outside #list for no
>> >> good enough reason, there can be some option to handle that. But I
>> >> don't think restricting the usage to #list is a good compromise as the
>> >> default.
>> >
>> > I agree. Just keep as it is.
>> >
>> >> >> I'm not sure how efficiently could a configuration setting catch
>> these
>> >> >> cases, or if it should be addressed on that level.
>> >> >
>> >> > Maybe let's postpone configurability discussion a bit until the above
>> is
>> >> > more clear.
>> >>
>> >> In the light of the above, I think we can start thinking about that
>> >> now.
>> >
>> > On that note on configurability: Would it be possible to programmatically
>> > influence the Collection (Sequence) which is created under the hood?
>> > E.g. by specifying a Factory? I ask because we are using something like
>> > this (
>> >
>> https://dzone.com/articles/a-filebasedcollection-in-java-for-big-collections
>> )
>> > in other places for large collections. I know it is very specific, but
>> just
>> > wanted to bring it up.
>> [snip]
>>
>> I think a good approach would be to ban the *implicit* collection of
>> the result, when the filtered/mapped source is an Iterator, or other
>> similar stream-like object that's often used for enumerating a huge
>> number of elements. So for example, let's say you have this:
>>
>>   <#assign xs2 = xs?filter(f)>
>>
>> If xs is List-like, then this will work. Since the xs List fits into
>> the memory (although a List can be backed by disk, that's rather
>> rare), hopefully it's not the kind of data amount that can't fit into
>> the memory again (as xs2). On the other hand, if xs is an
>> Iterator-like object, then the above statement fails, with the hint
>> that xs?filter(f)?sequence would work, but might consumes a lot of
>> memory.
>>
>> This is also consistent with how xs[i] works in the existing
>> FreeMarker versions. That only works if xs is List-like (an FTL
>> sequence). While xs[i] would be trivial to implement even if xs is
>> Iterator-like, we don't do that as it's not efficient for a high i,
>> and so the template author is probably not meant to do that. If he
>> knows what's he doing though, he can write xs?sequence[i]. Yes, that's
>> very inefficient if you only use [] once on that sequence, but you see
>> the logic. map/filter breaks it, as xs?filter(f)[i] works even if xs
>> is an Iterator, because filter/map currently always returns a
>> sequence. If xs is Iteartor-like, then I want filter/map to return an
>> Iterator-like as well, so then [] will fail on it.
>>
>> As a side note, I will make ?sequence smarter too, so that
>> xs?sequence[i] won't actually build a sequence if xs is Iterator-like.
>> It just have to skip the first i elements after all. (The ?sequence is
>> still required there. It basically says: "I know what I'm doing, treat
>> this as a sequence.")
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>>
>

-- 
Thanks,
 Daniel Dekany

Re: Lambda Expressions - filter list without <#list> directive

Reply via email to