Hi Xinhui As Stephan mentioned for the batch jobs, there are 2 - 3 tables would be nice addition. Can we use the same Spark examples as below to implement it. Thanks.
For example: 1. Scan Query SELECT pageURL, pageRank FROM rankings WHERE pageRank > X 2. Aggregation Query SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BY SUBSTR(sourceIP, 1, X) 3. Join Query SELECT sourceIP, totalRevenue, avgPageRank FROM (SELECT sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenue FROM Rankings AS R, UserVisits AS UV WHERE R.pageURL = UV.destURL AND UV.visitDate BETWEEN Date(`1980-01-01') AND Date(`X') GROUP BY UV.sourceIP) ORDER BY totalRevenue DESC LIMIT 1 https://amplab.cs.berkeley.edu/benchmark/ -- View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Benchmarks-of-Flink-supporting-Flink-in-BigDataBench-tp7079p7114.html Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.