Re: Lambda Expressions - filter list without <#list> directive

Pete Helgren Tue, 02 Jul 2019 12:29:38 -0700

As a more casual Java programmer, the "where" option is much clearer tome. I spend more time using FM syntax than changing the Java underneath,so from a "fading memory" standpoint, "where" would lead to fewer "Whatthe....?" moments, for me at least.


Pete Helgren
www.petesworkshop.com
GIAC Secure Software Programmer-Java
Twitter - Sys_i_Geek  IBM_i_Geek


On 7/2/2019 2:08 PM, Christoph Rüger wrote:

Good point. Seems you are not the first ones stumbling on that one.
I quickly searched around and found:

Similar question on SO:
https://stackoverflow.com/questions/45939202/filter-naming-convention
Javascript: filter :
https://developer.mozilla.org/de/docs/Web/JavaScript/Reference/Global_Objects/Array/filter
Spark SQL -> "where" is an alias for "filter":
https://stackoverflow.com/a/33887122/135535
<https://stackoverflow.com/questions/33885979/difference-between-filter-and-where-in-scala-spark-sql>
-> search for "filter" or "where" on
https://spark.apache.org/docs/1.5.2/api/scala/index.html#org.apache.spark.sql.DataFrame
R Statistics Language : filter
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html#filter-rows-with-filter

Python: filter https://www.geeksforgeeks.org/filter-in-python/
Ruby: they use select:
https://www.codementor.io/tips/8247613177/how-to-filter-arrays-of-data-in-ruby
Kotlin: filter:
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/filter.html

This languages rank in the upper area of the Stackoverflow survey:
https://insights.stackoverflow.com/survey/2019#technology-_-programming-scripting-and-markup-languages

I agree that "where" reads pretty nice. I like it. But "filter" seems to be
found in multiple common languages supporting lambdaish syntax.
Python and R is especially common in the data science / statistics
community, which are different target group than e.g. Java-Programmers.
Also web-developers these days are doing lots of javascript to build "html"
websites / templates - and javascript also uses "filter".

My vote would still go for "filter", because I think we are working on
lists of objects and objects are closer to "programming" than to "sql".
Maybe the "where"-alias would be a compromise - but might also be confusing
two have both.

What do others think?

Thanks
Christoph

Am Di., 2. Juli 2019 um 20:27 Uhr schrieb Daniel Dekany <[email protected]

:
I wonder if "filter" is a good name. For Java 8 programmers it's
given, but otherwise I find it confusing, as it's not clear if you
specify what to filter out, or what to keep. Worse, I believe in every
day English "foo filter" or "filters foo" means removing foo-s because
you don't want them, which is just the opposite of the meaning in
Java. So I think "where", which is familiar for many from SQL (for
most Java programmers as well, but also for non-Java programmers),
would be better. Consider:

   users?filter(user -> user.inactive)

VS

   users?where(user -> user.inactive)

The first can be easily misunderstood as removing the inactive users,
while the meaning of the second is obvious.


Tuesday, July 2, 2019, 2:57:52 PM, Christoph Rüger wrote:

Thanks for the heads up. Very nice. We will run our test suite to see if
those test are still green.

Am Mo., 1. Juli 2019 um 09:30 Uhr schrieb Daniel Dekany <

[email protected]

:
Since then I have also made a change that ensures that if the lambda
argument is null (which in FTL is the same as if the variable isn't
there at all), then it will not fall back to find an identically named
variable in higher variable scopes. This is important when doing
things like:

   <#-- filters out null-s -->
   myList?filter(it -> it??)

because if some day someone adds a variable called "it" to the
data-model, then suddenly the above won't filter out the null-s.

The same thing was always an issue with #list loop variables as well,
also with #nested arguments. So I have added a configuration setting
called "fallbackOnNullLoopVariable", which is by default true
(unfortunate historical baggage... but we can't break backward
compatibility). If you set it to false, then this will print "N/A" at
null list items, rather than "Clashing variable in higher scope":

<#assign it = "Clashing variable in higher scope">
<#list myList as it>
   ${it!'N/A'}
</#list>

These changes are pushed and deployed to the Apache snapshot Maven
repo in both branches.


So, apart from documentation, the local lambda feature is about ready,
or so I hope. I'm worried of rough edges though, so I think I will add
lambda support to some more builtins (?seq_contains, ?sort_by), and
explore some more use cases... If you have your own that you actually
keep running into, or want to be in the 2.3.29, tell it.


Monday, June 24, 2019, 1:59:21 AM, Daniel Dekany wrote:

Well, I'm not exactly fast nowadays either... Anyway, I have pushed
and deployed to the snapshot repo the changes I was talking about
recently. That is, ?map or ?filter won't make a sequence out of an
enumerable non-sequence (typically an Iterator) anymore. Because, it
was the concern that if hugeResultSet is an Iterator because it's
huge, then someone might writes:

   <#assign transformed = hugeResultSet?map(it -> something(it))>
   <#list transformed as it>

instead of just

   <#list hugeResultSet?map(it -> something(it)) as it>

and thus consuming a lot of memory without realizing it. So now if
hugeResultSet wasn't already a sequence (List-like), the assignment
will be an error, since we can't safely store a lazily transformed
collection (lambdas will break), and we can't condense it down to a
sequence (List-like thing) automatically either, as that might
consumes too much memory. If hugeResultSet was a sequence, then it's
not an error, as we assume that keeping all of it in memory is fine,
as the original was stored there as well (in practice, most of the
times... in principle we can't know).

Now if the user feels confident about it, they can still write:

   <#assign transformed = hugeResultSet?map(it ->

something(it))?sequence>

Similarly, hugeResultSet?map(it -> something(it))[index] will be an
error, as [index] is for sequences only, and ?map will not change a
non-sequence to a sequence anymore. Similarly, if the user feels
confident about it, they can write hugeResultSet?map(it ->
something(it))?sequence[index].

An interesting consequence of these is that ?sequence is now a bit
smarter than before. Like if you write myIterator?sequnce[n], it will
not fetch the elements into an in-memory sequence, it just skips n
elements from myIterators, and returns the nth one. Similarly,
myIterator?sequence?size won't store the elements in memory, it just
counts them.

As an interesting note, these two are also identically efficient:

   <#assign seq = hugeResultSet?filter(it -> something(it))?sequence>
   <#assign seq = hugeResultSet?sequence?filter(it -> something(it))>

In both cases the actual conversion to a sequence (in-memory list)
happens only just before assigning the value to seq. Once again,
?sequence now just means "it's OK to treat this as a sequence, however
inefficient it is", and not "convert it to sequence right now".


Friday, June 7, 2019, 10:38:50 AM, Christoph Rüger wrote:

These optimisations sound great. I will try to run some tests within

the

next weeks. A bit busy lately.
Thanks
Christoph

Am Mi., 29. Mai 2019 um 23:55 Uhr schrieb Daniel Dekany <

[email protected]

:
Tuesday, April 2, 2019, 12:10:16 PM, Christoph Rüger wrote:

[snip]

Well, if you fear users jumping on ?filter/?map outside #list

for no

good enough reason, there can be some option to handle that. But

don't think restricting the usage to #list is a good compromise

as

the

default.

I agree. Just keep as it is.

I'm not sure how efficiently could a configuration setting

catch

these

cases, or if it should be addressed on that level.

Maybe let's postpone configurability discussion a bit until the

above

is

more clear.

In the light of the above, I think we can start thinking about

that

now.

On that note on configurability: Would it be possible to

programmatically

influence the Collection (Sequence) which is created under the

hood?

E.g. by specifying a Factory? I ask because we are using something

like

this (

https://dzone.com/articles/a-filebasedcollection-in-java-for-big-collections

in other places for large collections. I know it is very specific,

but

just

wanted to bring it up.

[snip]

I think a good approach would be to ban the *implicit* collection of
the result, when the filtered/mapped source is an Iterator, or other
similar stream-like object that's often used for enumerating a huge
number of elements. So for example, let's say you have this:

   <#assign xs2 = xs?filter(f)>

If xs is List-like, then this will work. Since the xs List fits into
the memory (although a List can be backed by disk, that's rather
rare), hopefully it's not the kind of data amount that can't fit

into

the memory again (as xs2). On the other hand, if xs is an
Iterator-like object, then the above statement fails, with the hint
that xs?filter(f)?sequence would work, but might consumes a lot of
memory.

This is also consistent with how xs[i] works in the existing
FreeMarker versions. That only works if xs is List-like (an FTL
sequence). While xs[i] would be trivial to implement even if xs is
Iterator-like, we don't do that as it's not efficient for a high i,
and so the template author is probably not meant to do that. If he
knows what's he doing though, he can write xs?sequence[i]. Yes,

that's

very inefficient if you only use [] once on that sequence, but you

see

the logic. map/filter breaks it, as xs?filter(f)[i] works even if xs
is an Iterator, because filter/map currently always returns a
sequence. If xs is Iteartor-like, then I want filter/map to return

an

Iterator-like as well, so then [] will fail on it.

As a side note, I will make ?sequence smarter too, so that
xs?sequence[i] won't actually build a sequence if xs is

Iterator-like.

It just have to skip the first i elements after all. (The ?sequence

is

still required there. It basically says: "I know what I'm doing,

treat

this as a sequence.")

--
Thanks,
  Daniel Dekany

--
Thanks,
  Daniel Dekany

--
Christoph Rüger, Geschäftsführer
Synesty <https://synesty.com/> - Anbinden und Automatisieren ohne
Programmieren - Automatisierung, Schnittstellen, Datenfeeds

Xing: https://www.xing.com/profile/Christoph_Rueger2
LinkedIn: http://www.linkedin.com/pub/christoph-rueger/a/685/198

--
Thanks,
  Daniel Dekany

Re: Lambda Expressions - filter list without <#list> directive

Reply via email to