The initial goal of PigMix was definitely to give the project a way to
measure itself against MapReduce and between different versions of
releases. So that falls into your synthetic category.
That said, if adding a field enables extending the bench mark into new
territory and makes it more useful then that seems like a clear win.
Alan.
Keren Ouaknine <mailto:[email protected]>
July 14, 2015 at 12:44
Hi,
I am working on expanding the PigMix benchmark.
I am interested to add queries matching more realistic use cases, such as
finding what are the highest revenue of a page or what is the burst of
activity for a specific page. Additionally, I would like to add OLTP-like
queries such as finding other users from the same neighborhood looking
at a
specific page.
The current PigMix table does not have an id for a page access (see
details
on page_views here
<https://cwiki.apache.org/confluence/display/PIG/PigMix>).
Therefore I cannot run the above queries.
I am wondering why was this field omitted from the schema of page_views?
It seems a fundamental field for all aggregation queries on page_views.
I see two options: either there is another use case that this schema
targets (what is it?) or the benchmark's goal is not to target real use
cases and is merely oriented towards a synthetic performance and
measurement goal.
Any ideas?
Thank you,
Keren
PS: I sent this email to both the devs and users' mailing list, not to
spam us :) but because these queries are both a users and a development
concern.