Hi  Xinhui

As Stephan mentioned for the batch jobs, there are 2 - 3 tables would be
nice addition. 
Can we use the same Spark examples as below to implement it.
Thanks. 


For example:
1. Scan Query
SELECT pageURL, pageRank FROM rankings WHERE pageRank > X

2. Aggregation Query
SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BY
SUBSTR(sourceIP, 1, X)


3. Join Query
SELECT sourceIP, totalRevenue, avgPageRank
FROM
  (SELECT sourceIP,
          AVG(pageRank) as avgPageRank,
          SUM(adRevenue) as totalRevenue
    FROM Rankings AS R, UserVisits AS UV
    WHERE R.pageURL = UV.destURL
       AND UV.visitDate BETWEEN Date(`1980-01-01') AND Date(`X')
    GROUP BY UV.sourceIP)
  ORDER BY totalRevenue DESC LIMIT 1

https://amplab.cs.berkeley.edu/benchmark/




--
View this message in context: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Benchmarks-of-Flink-supporting-Flink-in-BigDataBench-tp7079p7114.html
Sent from the Apache Flink Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to