Re: Issues while Running Apache Phoenix against TPC-H data

larsh Fri, 19 Aug 2016 14:03:51 -0700

I think Stack it trying to help and was just asking whether Mujtaba did 
something special to load the data (and perhaps how it took for us and on how 
many nodes we did that).(If it loaded fine for us and there was nothing special 
we had to do, I agree that there's no way (or need) to troubleshoot vendor 
specific benchmark setups.)
I also agree running some subset of TPC-* would be a boon for Phoenix and boost 
its adoption.


At the same time Phoenix is moving at an incredible speed. 4.7 is already old 
(considering the fixes in 4.8), 4.4 is _ancient_. In 4.9 (or 5.0) we'll have 
column and dense encoding, which would speed up this type of query.

Now, Amit never replied about how their HBase is actually configured (heap 
sizes, etc). Phoenix runs inside of the region server, and hence their 
configuration is extremely important.
-- Lars

      From: James Taylor <jamestay...@apache.org>
 To: "dev@phoenix.apache.org" <dev@phoenix.apache.org> 
 Sent: Friday, August 19, 2016 1:19 PM
 Subject: Re: Issues while Running Apache Phoenix against TPC-H data
   
On Fri, Aug 19, 2016 at 11:37 AM, Stack <st...@duboce.net> wrote:

> On Thu, Aug 18, 2016 at 5:54 PM, James Taylor <jamestay...@apache.org>
> wrote:
>
> > The data loaded fine for us.
>
>
> Mind describing what you did to get it to work and with what versions and
> configurations and with what TPC loading and how much of the workload was
> supported? Was it a one-off project?
>

Mujtaba already kindly responded to this (about a week back on this
thread). He was able to load the data for the benchmark onto one of our
internal clusters. He didn't run the benchmarks. Sorry, but I don't have
any more specific knowledge, but generally I think:
- it's difficult for an OS project to troubleshoot environmental issues and
it's even more difficult if a user is using a vendor specific distro. IMHO,
if you ask an open source project for help, you should be using the
artifacts that they produce (preferably the latest release).
- using a three node cluster for HBase is not ideal for benchmarking.
- doing full table scans over large HBase tables will be slow.


>
>
>
> > If TPC is not representative of real
> > workloads, I'm not sure there's value in spending a lot of time running
> > them.
>
>
> I suppose the project could just ignore TPC but I'd suggest that Phoenix
> put up a page explaining why TPC does not apply if this the case; i.e. it
> is not representative of Phoenix work loads. When people see that Phoenix
> is for "OLTP and analytical queries", they probably think the TPC loadings
> will just work given their standing in the industry. Putting up a disavowal
> with explanation will save folks time trying to make it work and it can
> also be cited when folks try to run TPC against Phoenix and they have a bad
> experience, say bad performance.
>

I haven't run the TPC benchmarks, so I have no idea how they perform. I
work at Salesforce where we use Phoenix (among may other technologies) to
support various big data use cases. The workloads I'm familiar with aren't
similar to the TPC benchmarks, so they're not relevant for my work. But if
TPC benchmarks are relevant for your work, then that'd be great if you
pursued this. Or maybe we can get this "Phoenix" person you mentioned to do
it (smile).


>
> On the other hand, even if an artificial loading, unless Phoenix has a
> better means of verifying all works, I'd think it would be a useful test to
> run before release or on a nightly basis verifying no regression in
> performance or in utility.
>

I think the community would welcome enhancing our existing regression test
suite. If you're up for leading that effort, that'd be great.

Thanks,
James

Re: Issues while Running Apache Phoenix against TPC-H data

Reply via email to