Re: Issues while Running Apache Phoenix against TPC-H data

James Taylor Fri, 19 Aug 2016 13:19:21 -0700

On Fri, Aug 19, 2016 at 11:37 AM, Stack <st...@duboce.net> wrote:

> On Thu, Aug 18, 2016 at 5:54 PM, James Taylor <jamestay...@apache.org>
> wrote:
>
> > The data loaded fine for us.
>
>
> Mind describing what you did to get it to work and with what versions and
> configurations and with what TPC loading and how much of the workload was
> supported? Was it a one-off project?
>


Mujtaba already kindly responded to this (about a week back on this
thread). He was able to load the data for the benchmark onto one of our
internal clusters. He didn't run the benchmarks. Sorry, but I don't have
any more specific knowledge, but generally I think:
- it's difficult for an OS project to troubleshoot environmental issues and
it's even more difficult if a user is using a vendor specific distro. IMHO,
if you ask an open source project for help, you should be using the
artifacts that they produce (preferably the latest release).
- using a three node cluster for HBase is not ideal for benchmarking.
- doing full table scans over large HBase tables will be slow.


>
>
>
> > If TPC is not representative of real
> > workloads, I'm not sure there's value in spending a lot of time running
> > them.
>
>
> I suppose the project could just ignore TPC but I'd suggest that Phoenix
> put up a page explaining why TPC does not apply if this the case; i.e. it
> is not representative of Phoenix work loads. When people see that Phoenix
> is for "OLTP and analytical queries", they probably think the TPC loadings
> will just work given their standing in the industry. Putting up a disavowal
> with explanation will save folks time trying to make it work and it can
> also be cited when folks try to run TPC against Phoenix and they have a bad
> experience, say bad performance.
>

I haven't run the TPC benchmarks, so I have no idea how they perform. I
work at Salesforce where we use Phoenix (among may other technologies) to
support various big data use cases. The workloads I'm familiar with aren't
similar to the TPC benchmarks, so they're not relevant for my work. But if
TPC benchmarks are relevant for your work, then that'd be great if you
pursued this. Or maybe we can get this "Phoenix" person you mentioned to do
it (smile).


>
> On the other hand, even if an artificial loading, unless Phoenix has a
> better means of verifying all works, I'd think it would be a useful test to
> run before release or on a nightly basis verifying no regression in
> performance or in utility.
>

I think the community would welcome enhancing our existing regression test
suite. If you're up for leading that effort, that'd be great.

Thanks,
James

Re: Issues while Running Apache Phoenix against TPC-H data

Reply via email to