Hi, In order to do query performance design I was hoping if someone could help by creating a set of various queries which then maps down to various primitives that can then be modelled in both C++/Supersonic and Java. If it could be kept concrete and use the datasets below, the sample queries can be developed and tested using BigQuery. For expediency reasons I am using the data set below with decompresses to a 38 GB sample file. If people think other data set files relevant, then please let me know, but I would like to keep the final data set under 24 & 8 GB since this is the maximum size of memory I have in readily available machines.
In creating test cases please have a view to concurrency models and how thing such as SIMD will process queries. In the first instance I would like to keep queries all in memory so I am testing primitive operation rather than I/O performance of my hard disk. Help appreciated, Kind Regards, Philip https://code.google.com/p/supersonic/wiki/ExpressionReference https://code.google.com/p/supersonic/wiki/OperationReference https://developers.google.com/bigquery/docs/dataset-wikipedia http://dumps.wikimedia.org/enwiki/20121001/enwiki-20121001-pages-articles-mu ltistream.xml.bz2
