Hi, I did some work recently on adding support for SQL-like queries on top of DataSets. (This is known as "named datasets" in the jira issue: https://issues.apache.org/jira/browse/FLINK-947?jql=project%20%3D%20FLINK%20AND%20assignee%20%3D%20currentUser()%20AND%20resolution%20%3D%20Unresolved).
I have support for filter, join, grouping and aggregation. I think the basis is quite strong now but we can add support for more data types and supported operations in the select expressions. Please have a look at my branch if you're interested: https://github.com/aljoscha/flink/tree/linq You can look at the new Expression ITCases to see what features are currently available and how the interface is used. There are also two complete programs: PageRankExpression and TPCHQuery3Expression. And now at last, a sneak peek at how the new interface is used: in.group('key).select('key, ('a + 10).avg + " the average", 'a.count) The notation 'foo are Scala symbols, I use them in the DSL to reference named fields. Cheers, Aljoscha