Two simple queries that come to mind are

1) global revision-count

2) revision-count per wikipedia page, per author or per date

The key step as far as supersonic is concerned is to run against one shard
of the data.  You can extrapolate almost everything from there.





On Thu, Oct 18, 2012 at 4:35 PM, Philip Haynes <
[email protected]> wrote:

> Hi,
>
> In order to do query performance design I was hoping if someone could help
> by creating a set of various queries which then maps down to various
> primitives that can then be modelled in both C++/Supersonic and Java.
> If it could be kept concrete and use the datasets below, the sample queries
> can be developed and tested using BigQuery. For expediency  reasons I am
> using the data set below with decompresses to a 38 GB sample file. If
> people
> think other data set  files relevant, then please let me know, but I would
> like to keep the final data set under 24 & 8 GB since this is the maximum
> size of memory I have in readily available machines.
>
> In creating test cases please have a view to concurrency models and how
> thing such as SIMD will process queries.
> In the first instance I would like to keep queries all in memory so I am
> testing primitive operation rather than I/O performance of my hard disk.
>
> Help appreciated,
> Kind Regards,
> Philip
>
>
> https://code.google.com/p/supersonic/wiki/ExpressionReference
> https://code.google.com/p/supersonic/wiki/OperationReference
>
> https://developers.google.com/bigquery/docs/dataset-wikipedia
>
> http://dumps.wikimedia.org/enwiki/20121001/enwiki-20121001-pages-articles-mu
> ltistream.xml.bz2
>
>
>

Reply via email to