Re: Multitenancy supported in phoenix-spark integration ?

Josh Mahonin Fri, 07 Oct 2016 12:14:46 -0700

Hi Nico,

To elaborate a little bit, there are two options for interfacing with
Phoenix from Spark. Either using Spark's built-in JDBC integration, which
uses the regular Phoenix JDBC driver, or by using the phoenix-spark
integration, which leverages Phoenix's MapReduce support. If you are able
to use the JDBC integration for your use case, then you are likely able to
use the 'TenantID' and other connection parameters as per the JDBC
documentation.


However, if you are using the phoenix-spark integration, you can only pass
in a zookeeper url, a table, a sequence of columns, and a Hadoop
Configuration object. If you have a dataframe where the tenant column is
'TID' and want to retrieve only data for that tenant, you can use the
DataFrame pushdown support, via something like:

  df.where(df("TID") == "some_tenant")

Or if you're using RDDs directly:

  sc.phoenixTableAsRDD(
    table = "...",
    columns = Seq(...),
    predicate = Some("TID = 'FOO'"),
    zkUrl = Some("zk-server:2181")
  )

For saving, you'll need to ensure that your DataFrame schema includes the
tenantID as the leading column, and then you can use something like:

  df.write
    .format("org.apache.phoenix.spark")
    .mode("overwrite")
    .option("table", "...")
    .option("zkUrl", "...")
    .save()

Or for RDDs:

  rdd.saveToPhoenix(
    tableName = "...",
    cols = Seq("TID", ...),
    zkUrl = Some("zk-server:2181")
  )


Good luck,

Josh


On Fri, Oct 7, 2016 at 2:28 PM, Nico Pappagianis <
[email protected]> wrote:

> Hi Josh just saw your post. Let me take a look at that stack overflow post
> you linked. Could you elaborate on how you pass TenantID manually?
>
> Thanks
>
> On Fri, Oct 7, 2016 at 11:27 AM, Nico Pappagianis <
> [email protected]> wrote:
>
> > Thanks James,
> >
> > I am able to read from the tenant-specific view using the SQLContext.read
> > function, which gives a DataFrameReader to work with. However the
> > SQLContext class doesn't appear to have any functionality for writing.
> > SQLContext has, so far, been the only way I've been able to read a tenant
> > view. Phoenix-Spark integration doesn't appear to pass along the
> TenantID,
> > at least in the same manner as we would pass it when connecting to ZK via
> > squirrel, or SQLContext.
> >
> > But maybe I am missing something.
> >
> >
> >
> > On Fri, Oct 7, 2016 at 10:14 AM, James Taylor <[email protected]>
> > wrote:
> >
> >> Not sure if the phoenix-spark integration propagates connection
> >> properties,
> >> but an alternate way of specifying the tenant ID is through that
> mechanism
> >> (which would workaround the URL building issue).
> >>
> >> On Fri, Oct 7, 2016 at 10:06 AM, Nico Pappagianis <
> >> [email protected]> wrote:
> >>
> >> > Hi James,
> >> > At first I thought it was the dataframe vs rdd implementations but
> >> looking
> >> > closer my bet is the way spark connects to phoenix. When reading via
> >> > SqlContext I pass in
> >> >
> >> > "url" -> "jdbc:phoenix:zkHost1, zkHost2, zkHost3:zkPort;TenantId=
> >> > 123456789"
> >> > and it connects as the tenant.
> >> >
> >> > However SqlContext does not have a write/save function.
> >> >
> >> > When I try to save by other means I am required to pass in a value for
> >> > "zkUrl" (not "url"). "zkUrl" cannot have the "jdbc:phoenix:" portion
> >> > attached (because it attaches zkPort to the end of jdbc:phoenix and
> >> errors
> >> > out). As such I cannot connect as the tenant.
> >> >
> >> > When connecting as the tenant via squirrel client I use the same "url"
> >> > string above, and it works.
> >> >
> >> > So to me it appears to be an issue of how to connect to phoenix as the
> >> > tenant via spark/phoenix-spark integration. I have not found a clear
> cut
> >> > way to do so.
> >> >
> >> >
> >> > Thanks,
> >> > -Nico
> >> >
> >> >
> >> > On Fri, Oct 7, 2016 at 9:03 AM, James Taylor <[email protected]>
> >> > wrote:
> >> >
> >> > > Hi Nico,
> >> > > You mentioned offline that it seems to be working for data frames,
> but
> >> > not
> >> > > RDDs. Can you elaborate on that? Have you confirmed whether the
> >> TenantId
> >> > > connection property is being propagated down to the Phoenix
> connection
> >> > > opened for the Spark integration?
> >> > > Thanks,
> >> > > James
> >> > >
> >> > > On Thu, Oct 6, 2016 at 8:36 PM, Nico Pappagianis <
> >> > > [email protected]> wrote:
> >> > >
> >> > > > Does phoenix-spark integration support multitenancy? I'm having a
> >> hard
> >> > > time
> >> > > > getting it working on my tenant-specific view.
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Multitenancy supported in phoenix-spark integration ?

Reply via email to