Re: Multitenancy supported in phoenix-spark integration ?

Nico Pappagianis Fri, 07 Oct 2016 13:45:31 -0700

Hmm, so I can query multi-tenant tables under a certain tenantId just fine.
But when I create a multi-tenant view I am getting a table undefined error.
This is because I am not actually connecting as the tenant (I realize not
the predicate doesn't actually connect you as the tenant). It appears then
that there is no way to connect as a tenant with phoenix-spark integration,
and therefore we cannot view a tenant-specific table.
Is this correct?



On Fri, Oct 7, 2016 at 1:28 PM, Nico Pappagianis <
[email protected]> wrote:

> Thanks Josh,
> The above works for multi-tenant tables that I have created for testing
> but I'm still running into issues with multi-tenant tables created
> elsewhere. I'll try to track down the DDL used to create the tables I'm
> having problems with - maybe that will illuminate some more things.
>
> Thanks for pointing out the use of predicate, I wasn't familiar with that!
>
>
>
> On Fri, Oct 7, 2016 at 12:14 PM, Josh Mahonin <[email protected]> wrote:
>
>> Hi Nico,
>>
>> To elaborate a little bit, there are two options for interfacing with
>> Phoenix from Spark. Either using Spark's built-in JDBC integration, which
>> uses the regular Phoenix JDBC driver, or by using the phoenix-spark
>> integration, which leverages Phoenix's MapReduce support. If you are able
>> to use the JDBC integration for your use case, then you are likely able to
>> use the 'TenantID' and other connection parameters as per the JDBC
>> documentation.
>>
>> However, if you are using the phoenix-spark integration, you can only pass
>> in a zookeeper url, a table, a sequence of columns, and a Hadoop
>> Configuration object. If you have a dataframe where the tenant column is
>> 'TID' and want to retrieve only data for that tenant, you can use the
>> DataFrame pushdown support, via something like:
>>
>>   df.where(df("TID") == "some_tenant")
>>
>> Or if you're using RDDs directly:
>>
>>   sc.phoenixTableAsRDD(
>>     table = "...",
>>     columns = Seq(...),
>>     predicate = Some("TID = 'FOO'"),
>>     zkUrl = Some("zk-server:2181")
>>   )
>>
>> For saving, you'll need to ensure that your DataFrame schema includes the
>> tenantID as the leading column, and then you can use something like:
>>
>>   df.write
>>     .format("org.apache.phoenix.spark")
>>     .mode("overwrite")
>>     .option("table", "...")
>>     .option("zkUrl", "...")
>>     .save()
>>
>> Or for RDDs:
>>
>>   rdd.saveToPhoenix(
>>     tableName = "...",
>>     cols = Seq("TID", ...),
>>     zkUrl = Some("zk-server:2181")
>>   )
>>
>>
>> Good luck,
>>
>> Josh
>>
>>
>> On Fri, Oct 7, 2016 at 2:28 PM, Nico Pappagianis <
>> [email protected]> wrote:
>>
>> > Hi Josh just saw your post. Let me take a look at that stack overflow
>> post
>> > you linked. Could you elaborate on how you pass TenantID manually?
>> >
>> > Thanks
>> >
>> > On Fri, Oct 7, 2016 at 11:27 AM, Nico Pappagianis <
>> > [email protected]> wrote:
>> >
>> > > Thanks James,
>> > >
>> > > I am able to read from the tenant-specific view using the
>> SQLContext.read
>> > > function, which gives a DataFrameReader to work with. However the
>> > > SQLContext class doesn't appear to have any functionality for writing.
>> > > SQLContext has, so far, been the only way I've been able to read a
>> tenant
>> > > view. Phoenix-Spark integration doesn't appear to pass along the
>> > TenantID,
>> > > at least in the same manner as we would pass it when connecting to ZK
>> via
>> > > squirrel, or SQLContext.
>> > >
>> > > But maybe I am missing something.
>> > >
>> > >
>> > >
>> > > On Fri, Oct 7, 2016 at 10:14 AM, James Taylor <[email protected]
>> >
>> > > wrote:
>> > >
>> > >> Not sure if the phoenix-spark integration propagates connection
>> > >> properties,
>> > >> but an alternate way of specifying the tenant ID is through that
>> > mechanism
>> > >> (which would workaround the URL building issue).
>> > >>
>> > >> On Fri, Oct 7, 2016 at 10:06 AM, Nico Pappagianis <
>> > >> [email protected]> wrote:
>> > >>
>> > >> > Hi James,
>> > >> > At first I thought it was the dataframe vs rdd implementations but
>> > >> looking
>> > >> > closer my bet is the way spark connects to phoenix. When reading
>> via
>> > >> > SqlContext I pass in
>> > >> >
>> > >> > "url" -> "jdbc:phoenix:zkHost1, zkHost2, zkHost3:zkPort;TenantId=
>> > >> > 123456789"
>> > >> > and it connects as the tenant.
>> > >> >
>> > >> > However SqlContext does not have a write/save function.
>> > >> >
>> > >> > When I try to save by other means I am required to pass in a value
>> for
>> > >> > "zkUrl" (not "url"). "zkUrl" cannot have the "jdbc:phoenix:"
>> portion
>> > >> > attached (because it attaches zkPort to the end of jdbc:phoenix and
>> > >> errors
>> > >> > out). As such I cannot connect as the tenant.
>> > >> >
>> > >> > When connecting as the tenant via squirrel client I use the same
>> "url"
>> > >> > string above, and it works.
>> > >> >
>> > >> > So to me it appears to be an issue of how to connect to phoenix as
>> the
>> > >> > tenant via spark/phoenix-spark integration. I have not found a
>> clear
>> > cut
>> > >> > way to do so.
>> > >> >
>> > >> >
>> > >> > Thanks,
>> > >> > -Nico
>> > >> >
>> > >> >
>> > >> > On Fri, Oct 7, 2016 at 9:03 AM, James Taylor <
>> [email protected]>
>> > >> > wrote:
>> > >> >
>> > >> > > Hi Nico,
>> > >> > > You mentioned offline that it seems to be working for data
>> frames,
>> > but
>> > >> > not
>> > >> > > RDDs. Can you elaborate on that? Have you confirmed whether the
>> > >> TenantId
>> > >> > > connection property is being propagated down to the Phoenix
>> > connection
>> > >> > > opened for the Spark integration?
>> > >> > > Thanks,
>> > >> > > James
>> > >> > >
>> > >> > > On Thu, Oct 6, 2016 at 8:36 PM, Nico Pappagianis <
>> > >> > > [email protected]> wrote:
>> > >> > >
>> > >> > > > Does phoenix-spark integration support multitenancy? I'm
>> having a
>> > >> hard
>> > >> > > time
>> > >> > > > getting it working on my tenant-specific view.
>> > >> > > >
>> > >> > > > Thanks
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Multitenancy supported in phoenix-spark integration ?

Reply via email to