Thanks Josh, The above works for multi-tenant tables that I have created for testing but I'm still running into issues with multi-tenant tables created elsewhere. I'll try to track down the DDL used to create the tables I'm having problems with - maybe that will illuminate some more things.
Thanks for pointing out the use of predicate, I wasn't familiar with that! On Fri, Oct 7, 2016 at 12:14 PM, Josh Mahonin <[email protected]> wrote: > Hi Nico, > > To elaborate a little bit, there are two options for interfacing with > Phoenix from Spark. Either using Spark's built-in JDBC integration, which > uses the regular Phoenix JDBC driver, or by using the phoenix-spark > integration, which leverages Phoenix's MapReduce support. If you are able > to use the JDBC integration for your use case, then you are likely able to > use the 'TenantID' and other connection parameters as per the JDBC > documentation. > > However, if you are using the phoenix-spark integration, you can only pass > in a zookeeper url, a table, a sequence of columns, and a Hadoop > Configuration object. If you have a dataframe where the tenant column is > 'TID' and want to retrieve only data for that tenant, you can use the > DataFrame pushdown support, via something like: > > df.where(df("TID") == "some_tenant") > > Or if you're using RDDs directly: > > sc.phoenixTableAsRDD( > table = "...", > columns = Seq(...), > predicate = Some("TID = 'FOO'"), > zkUrl = Some("zk-server:2181") > ) > > For saving, you'll need to ensure that your DataFrame schema includes the > tenantID as the leading column, and then you can use something like: > > df.write > .format("org.apache.phoenix.spark") > .mode("overwrite") > .option("table", "...") > .option("zkUrl", "...") > .save() > > Or for RDDs: > > rdd.saveToPhoenix( > tableName = "...", > cols = Seq("TID", ...), > zkUrl = Some("zk-server:2181") > ) > > > Good luck, > > Josh > > > On Fri, Oct 7, 2016 at 2:28 PM, Nico Pappagianis < > [email protected]> wrote: > > > Hi Josh just saw your post. Let me take a look at that stack overflow > post > > you linked. Could you elaborate on how you pass TenantID manually? > > > > Thanks > > > > On Fri, Oct 7, 2016 at 11:27 AM, Nico Pappagianis < > > [email protected]> wrote: > > > > > Thanks James, > > > > > > I am able to read from the tenant-specific view using the > SQLContext.read > > > function, which gives a DataFrameReader to work with. However the > > > SQLContext class doesn't appear to have any functionality for writing. > > > SQLContext has, so far, been the only way I've been able to read a > tenant > > > view. Phoenix-Spark integration doesn't appear to pass along the > > TenantID, > > > at least in the same manner as we would pass it when connecting to ZK > via > > > squirrel, or SQLContext. > > > > > > But maybe I am missing something. > > > > > > > > > > > > On Fri, Oct 7, 2016 at 10:14 AM, James Taylor <[email protected]> > > > wrote: > > > > > >> Not sure if the phoenix-spark integration propagates connection > > >> properties, > > >> but an alternate way of specifying the tenant ID is through that > > mechanism > > >> (which would workaround the URL building issue). > > >> > > >> On Fri, Oct 7, 2016 at 10:06 AM, Nico Pappagianis < > > >> [email protected]> wrote: > > >> > > >> > Hi James, > > >> > At first I thought it was the dataframe vs rdd implementations but > > >> looking > > >> > closer my bet is the way spark connects to phoenix. When reading via > > >> > SqlContext I pass in > > >> > > > >> > "url" -> "jdbc:phoenix:zkHost1, zkHost2, zkHost3:zkPort;TenantId= > > >> > 123456789" > > >> > and it connects as the tenant. > > >> > > > >> > However SqlContext does not have a write/save function. > > >> > > > >> > When I try to save by other means I am required to pass in a value > for > > >> > "zkUrl" (not "url"). "zkUrl" cannot have the "jdbc:phoenix:" portion > > >> > attached (because it attaches zkPort to the end of jdbc:phoenix and > > >> errors > > >> > out). As such I cannot connect as the tenant. > > >> > > > >> > When connecting as the tenant via squirrel client I use the same > "url" > > >> > string above, and it works. > > >> > > > >> > So to me it appears to be an issue of how to connect to phoenix as > the > > >> > tenant via spark/phoenix-spark integration. I have not found a clear > > cut > > >> > way to do so. > > >> > > > >> > > > >> > Thanks, > > >> > -Nico > > >> > > > >> > > > >> > On Fri, Oct 7, 2016 at 9:03 AM, James Taylor < > [email protected]> > > >> > wrote: > > >> > > > >> > > Hi Nico, > > >> > > You mentioned offline that it seems to be working for data frames, > > but > > >> > not > > >> > > RDDs. Can you elaborate on that? Have you confirmed whether the > > >> TenantId > > >> > > connection property is being propagated down to the Phoenix > > connection > > >> > > opened for the Spark integration? > > >> > > Thanks, > > >> > > James > > >> > > > > >> > > On Thu, Oct 6, 2016 at 8:36 PM, Nico Pappagianis < > > >> > > [email protected]> wrote: > > >> > > > > >> > > > Does phoenix-spark integration support multitenancy? I'm having > a > > >> hard > > >> > > time > > >> > > > getting it working on my tenant-specific view. > > >> > > > > > >> > > > Thanks > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
