Re: Multitenancy supported in phoenix-spark integration ?

Nico Pappagianis Fri, 07 Oct 2016 13:29:49 -0700

Thanks Josh,
The above works for multi-tenant tables that I have created for testing but
I'm still running into issues with multi-tenant tables created elsewhere.
I'll try to track down the DDL used to create the tables I'm having
problems with - maybe that will illuminate some more things.


Thanks for pointing out the use of predicate, I wasn't familiar with that!



On Fri, Oct 7, 2016 at 12:14 PM, Josh Mahonin <[email protected]> wrote:

> Hi Nico,
>
> To elaborate a little bit, there are two options for interfacing with
> Phoenix from Spark. Either using Spark's built-in JDBC integration, which
> uses the regular Phoenix JDBC driver, or by using the phoenix-spark
> integration, which leverages Phoenix's MapReduce support. If you are able
> to use the JDBC integration for your use case, then you are likely able to
> use the 'TenantID' and other connection parameters as per the JDBC
> documentation.
>
> However, if you are using the phoenix-spark integration, you can only pass
> in a zookeeper url, a table, a sequence of columns, and a Hadoop
> Configuration object. If you have a dataframe where the tenant column is
> 'TID' and want to retrieve only data for that tenant, you can use the
> DataFrame pushdown support, via something like:
>
>   df.where(df("TID") == "some_tenant")
>
> Or if you're using RDDs directly:
>
>   sc.phoenixTableAsRDD(
>     table = "...",
>     columns = Seq(...),
>     predicate = Some("TID = 'FOO'"),
>     zkUrl = Some("zk-server:2181")
>   )
>
> For saving, you'll need to ensure that your DataFrame schema includes the
> tenantID as the leading column, and then you can use something like:
>
>   df.write
>     .format("org.apache.phoenix.spark")
>     .mode("overwrite")
>     .option("table", "...")
>     .option("zkUrl", "...")
>     .save()
>
> Or for RDDs:
>
>   rdd.saveToPhoenix(
>     tableName = "...",
>     cols = Seq("TID", ...),
>     zkUrl = Some("zk-server:2181")
>   )
>
>
> Good luck,
>
> Josh
>
>
> On Fri, Oct 7, 2016 at 2:28 PM, Nico Pappagianis <
> [email protected]> wrote:
>
> > Hi Josh just saw your post. Let me take a look at that stack overflow
> post
> > you linked. Could you elaborate on how you pass TenantID manually?
> >
> > Thanks
> >
> > On Fri, Oct 7, 2016 at 11:27 AM, Nico Pappagianis <
> > [email protected]> wrote:
> >
> > > Thanks James,
> > >
> > > I am able to read from the tenant-specific view using the
> SQLContext.read
> > > function, which gives a DataFrameReader to work with. However the
> > > SQLContext class doesn't appear to have any functionality for writing.
> > > SQLContext has, so far, been the only way I've been able to read a
> tenant
> > > view. Phoenix-Spark integration doesn't appear to pass along the
> > TenantID,
> > > at least in the same manner as we would pass it when connecting to ZK
> via
> > > squirrel, or SQLContext.
> > >
> > > But maybe I am missing something.
> > >
> > >
> > >
> > > On Fri, Oct 7, 2016 at 10:14 AM, James Taylor <[email protected]>
> > > wrote:
> > >
> > >> Not sure if the phoenix-spark integration propagates connection
> > >> properties,
> > >> but an alternate way of specifying the tenant ID is through that
> > mechanism
> > >> (which would workaround the URL building issue).
> > >>
> > >> On Fri, Oct 7, 2016 at 10:06 AM, Nico Pappagianis <
> > >> [email protected]> wrote:
> > >>
> > >> > Hi James,
> > >> > At first I thought it was the dataframe vs rdd implementations but
> > >> looking
> > >> > closer my bet is the way spark connects to phoenix. When reading via
> > >> > SqlContext I pass in
> > >> >
> > >> > "url" -> "jdbc:phoenix:zkHost1, zkHost2, zkHost3:zkPort;TenantId=
> > >> > 123456789"
> > >> > and it connects as the tenant.
> > >> >
> > >> > However SqlContext does not have a write/save function.
> > >> >
> > >> > When I try to save by other means I am required to pass in a value
> for
> > >> > "zkUrl" (not "url"). "zkUrl" cannot have the "jdbc:phoenix:" portion
> > >> > attached (because it attaches zkPort to the end of jdbc:phoenix and
> > >> errors
> > >> > out). As such I cannot connect as the tenant.
> > >> >
> > >> > When connecting as the tenant via squirrel client I use the same
> "url"
> > >> > string above, and it works.
> > >> >
> > >> > So to me it appears to be an issue of how to connect to phoenix as
> the
> > >> > tenant via spark/phoenix-spark integration. I have not found a clear
> > cut
> > >> > way to do so.
> > >> >
> > >> >
> > >> > Thanks,
> > >> > -Nico
> > >> >
> > >> >
> > >> > On Fri, Oct 7, 2016 at 9:03 AM, James Taylor <
> [email protected]>
> > >> > wrote:
> > >> >
> > >> > > Hi Nico,
> > >> > > You mentioned offline that it seems to be working for data frames,
> > but
> > >> > not
> > >> > > RDDs. Can you elaborate on that? Have you confirmed whether the
> > >> TenantId
> > >> > > connection property is being propagated down to the Phoenix
> > connection
> > >> > > opened for the Spark integration?
> > >> > > Thanks,
> > >> > > James
> > >> > >
> > >> > > On Thu, Oct 6, 2016 at 8:36 PM, Nico Pappagianis <
> > >> > > [email protected]> wrote:
> > >> > >
> > >> > > > Does phoenix-spark integration support multitenancy? I'm having
> a
> > >> hard
> > >> > > time
> > >> > > > getting it working on my tenant-specific view.
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Multitenancy supported in phoenix-spark integration ?

Reply via email to